JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


VotumeE 50 SEPTEMBER, 1955 NuMBER 271 








ARTICLES 
Statistics and Objective Economics . . . . Morris A. CopELAND 


Federal Trade Commission ape on reas in Concentration in Manu- 
facturing. . . ‘ . M. A. ADELMAN 


Colonial Social panies, . . ly . . . Wi1raM O. JonEs 


Concepts Employed ip Labor Force nn and Uses of Labor Force 
Data . A. Ross Ecxier, GERTRUDE BANCROFT, AND ROBERT PEARL 


Examination of Two Sources of Error in the Estimation of Net Internal 
Migration .. . . . Dante O. Price 


The Redesign of the Quett Gam Pop Matai Survey 
Morris H. HaNsEN 
WituraM N. Hurwirz, Haroup NISSELSON, AND JoOsEPH STEINBERG 


Sampling Methods in the Yugoslav 1953 Census of ae + , 
_ §. 8S. ZarKovié 


On Ad) usting Sample Tabulations to Census Counts... 
.. . .M.A. Et-Bapry anp F. F. STePHan 


The Application of Sampling Procedures to Business Operations . 
. 4 Howarp L. Jonzs 


Estimation of the Brazilian Coffee a “ae Senge is 4 
STEVENS 


On the Reliability of Responses hat’ in | Sample eee 
‘ Rosert FERBER 


The Cuihont Etasedatiaties enenbe on the ne Study: A by om my 
ALFRED 

Kinsey AND ASSOCIATES; HERBERT HYMAN AND Paun B. SHEATSLEY; 

A. H. Hopss anp R. D. LamBert; NicHoLas PasToRE AND JACOB 

Go.psTEe1n; Lewis M. TerMAN; Pau WALLIN; W. ALLEN WALLIs; AND 

Wittram G. CocHRAN, FREDERICK MOosTELLER, AND JOHN W. TUKEY 


The perenried of Double iserectitend for Attributes _. 
H. C. Hamaker anp R. VAN STRIK 


Generality of Caiileeie: _ io a enna Function. ; 
. és Epwin L. Crow 


Distributions of Solutions of a Set of nies mene ey an Applica- 
tion to Linear Programming) ._ . M. M. BaBBaR 


Estimation of Parameters from Reennastene Data , Penman M. Lorp 


Truncated Binomial and ages enie Distributions. . 
_ Pau R. Riper 


Restriction — Selection i in Rossin ens Bivariate Normal Distributions 
A. Cuirrorp CoHEN, JR. 


eset of Some Non-parametric Tests estes Normal Alternatives 
with an Application to Life Testing . . . . BENJAMIN EPSTEIN 








On the Distribution of a Positive Random hee Having a Discrete 
Probability Mass at the Origin... . . JOHN AITCHISON 


Increasing the Effective Length of Short Time-Series for the Purpose of 
Estimating Autoregressive Parameters . . AssBotT S. WEINSTEIN 


On Generalizations of Tchebychef’s Inequality . . . 4H. J. Gopwin 


The awemmeune Theory of arenas Inference . 
. . 3 .Oscarn KEMPTHORNE 


Statistical tiuteede ‘ 


BOOK REVIEWS 


Cocuran, WiiutaAM G., MosTELLER, FREDERICK, AND TUKEY, Jonn W., 
Statistical Problems of the Kinsey Report . Aurrep C. Kinsey, et al. 


DEANE, PHYLLIS, Colonial Social Accounting . . . Wi.itam O. Jonzs 


saenanaaaten cape The oon and ae of Experiments 
‘ R Georce E. P. Box 


Buss, c I., AND Caunous, D. Ww. la Outin of Biometry 
.H. FAIRFIELD Smirn 


a 0. eens . A pn J. W., ano Lusa, J. L., 
Editors, Statistics and Mathemat <s in Biology .. J. H. Bennert 


BENNETT, Cari A., AND syeesae, 5 Norman L., Statistical Analysis in 
Chemistry and the Chemical Ind ary. tl F. R. Himswortu 


ANDERSON, Oskar, Probleme de» “'«*istischen Methodenlehre in den Sozial- 
wissenschaften .*« % . « «» .  . WERNER Z. Hirscu 


Lotve, MicuEt, Probability Theory ~ « «+ « . Water L. Smita 


RAND EevOneneN, A Million ae — with 100,000 Normal 
Deviates 


Hoxigz, R. Gorpon, et - A History of the “Faculty of Political tie 
Columbia University : . . WiuuiaM R. Passt, JR. 


nen, NorseErt Luioyp, Quality Contrt Through Statistical Methods 
Rosert J. HADER 


ey Satine Editor, city o Birmingham Abstract of Statistics Number 
8, 1962-1954 . P . De VER SHOLES 


CaRTTER, ALLEN Munnar, The Redistribution of I ncome in Postwar Britain 
Seima F, Go_psMItT# 


Tver, R. H, An Essay on the Siiianiite Theory of Rank . 
. . Rosert M. Sotow 


Waanine en 3. Seiebiiing Meat o « « « «© «66 & Nomen 


GALENSON, wane, Labor semen in Soviet and American Industry 
yee ea > . Simon ROTTENBERG 


Rennie FRED, Stiapaind in Punched Card Computing . R. Zioua 


ome, mm, I rene a los métodos de * estadistica (Segunda parte) . 
. Paut R. Hatmos 


ne ate Scaciiiiaitain OF oP ee N ian Income—1954 Edi- 
tion—A Supplement to the Survey of Current Business » « 





JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


The Editors welcome the submission of manuscripts for possible publication. 
They should be typewritten entirely double-spaced, including footnotes, and 
two copies should be sent to the Editor, W. Allen Wallis, 207 Haskell Hall, 
University of Chicago, Chicago 37. Books for review should be sent to the same 
address. Unsolicited book reviews are not accepted, but suggestions of titles for 
review are welcome. 


EDITOR 


W. ALLEN Watts, University of Chicago 
ASSISTANT TO THE EDITOR: ELAINE SMITH 


ASSOCIATE EDITORS 


Harotp A. FREEMAN NATHAN KEYFITZ 
Massachusetts Institute of Tech. Dominion Bureau of Statistics 
GrorcE M. Kuznets I. RicHARD SAVAGE 
University of California National Bureau of Standards 
Sipney S. ALEXANDER 
Columbia Broadcasting System 


ADVISORY PANEL OF FORMER EDITORS 


Wituram G. Cocnran (1945-50) Frank A. Ross (1926-34, 41-45) 
Johns Hopkins University Thetford, Vermont 

Wituiam F. Ocpurn (1920-25) FREDERICK F.STEPHAN (1935-40) 
University of Chicago Princeton University 


Errata: Readers and authors are urged to submit to the Editor notices of 
errors found in this or any previous volume. These will be published once 
a year, in the December issue. 





1955 OFFICERS, AMERICAN STATISTICAL ASSOCIATION 


President 


Raupu J. WATKINS 


ReEnsts LIKERT 


W. J. Dixon 
CHURCHILL EISENHART 


Harry ALPERT 

C. I. Buss 
Witi1aM J. Carson 
R. E. Comstock 
Besse B. Day 
DaniEt B. DeLury 
Lucite DERRICK 
Murray DorkIn 

S. M. Free 


President-Elect 


GERTRUDE M. Cox 


Vice Presidents 


Henry ScHEFFE 


Directors 


MARGARET. Hacoop 
Dona.p C. RILEY 


Secretary-Treasurer 


SAMUEL WEISS 


Members of the Council 
Joun E. FrREUND 
Grorce Garvy 
FRANK A. HANNA 
Boyp HarsSHBARGER 
WiuiaM O. JonEs 
Wiuram H. Kester 
CLARENCE D. Lone 
Joun C. McKee 


Past President 


HERBERT MARSHALL 


Joun W. TuxKEy 


ManrrtTIN R. GAINSBRUGH 
ALFRED N. Watson 


Fe.ix E. Moore 
Jack MosHMAN 
Horace W. Norton 
ALMARIN PHILLIPS 
JoHN R. Stockton 
CoNnRAD TAEUBER 

W. ALLEN WALLIS 
KENNETH B. WILLIAMS 
Seymour L. Wo.rFseEin 





A cumulative Index to Volumes 1-34, 1888-1939, may be obtained from the 
Office of the Secretary of the American Statistical Association. 

Reprints of all articles published since 1945 may be purchased from the 
office of the secretary. Articles which have appeared during the past twelve 
months are usually available in quantity. 








EDITORIAL COLLABORATORS 


Forman S. AcTON 
Princeton University 
SrePpHEN G. ALLEN 
University of Minnesota 
Frep C. ANDREWS 
University of Nebraska 
Leo A. AROIAN 
Huges Aircraft Company 
KennetH J. ARROW 
Stanford University 
TuoMas ATKINSON 
Federal Reserve Bank of Atlanta 
RacHu Ras BAHADUR 
University of Chicago 
EpwARD BARANKIN 
University of California (Berkeley) 
Leo BARNES 
Prentice-Hall, Inc. 
M. S. BARTLETT 
University of Manchester 
Earu F. BEacH 
McGill University 
REINHARD BENDIX 
University of California (Berkeley) 
Martin J. BERGER 
National Bureau of Standards 
ABRAM BERGSON 
Columbia University 
JoseEpH BERKSON 
Mayo Clinic 
ALLAN BIRNBAUM 
Columbia University 
Davin BLANK 
Columbia University 
Junius R. Buum 
Indiana University 
IsADORE BLUMEN 
Cornell University 
Donatp J. BoacuE 
Miami University 
ALBERT H. BowKER 
Stanford University 
G. E. P. Box 
Imperial Chemical Industries, Ltd. 
R. A. BRADLEY 
Virginia Polytechnic Institute 
SamugeL H. Brooks 
Johns Hopkins University 
K. A. BROWNLEE 
University of Chicago 
JosepH G. BRYAN 
Massachusetts Institute of Technology 
Invinc W. Burr 
Purdue University 
Gienn L. Burrows 
Bureau of Agricultural Economics 
Urram CHAND 
Indian Council of Agricultural Re- 
search 


Dovetas G. CHAPMAN 

University of Washington 
HERMAN CHERNOFF 

Stanford University 
WILLIAM CLATWORTHY 

National Bureau of Standards 
ANSLEY J. CoALE 

Princeton University 
Witiram G. CocHraNn 

Johns Hopkins University 
A. C. CoHEN, Jr. 

University of Georgia 
WivuraM S. Connor 

Johnson and Johnson 
WarrEN N. CorDELL 

A. C. Nielsen Company 
Louis J. Cote 

Purdue University 
ALLEN T. Craia 

University of Iowa 
Ceciz C. Craia 

University of Michigan 
Joun H. Curtiss 

American Mathematical Society 
JosepH F. Day 

Bureau of the Census 
GeorceE B. Dantzia 

RAND Corporation 
Wa.tTerR L. DEEMER, JR. 

Department of the Air Force 
W. Epwarps DEMING 

New York University 
Epwarp F. DENISON 

Department of Commerce 
Cyrus DERMAN 

Columbia University 
AARON DIRECTOR 

University of Chicago 
Witrrip J. Drxon 

University of Oregon 
Tuomas G. DoNNELLY 

Dominion Bureau of Statistics 
Harowp F. Dorn 

U. S. Public Health Service 
Epwarp Dowp 

Cornell University 
Francis W. DrEscH 

Stanford Research Institute 
Acueson J. DUNCAN 

Johns Hopkins University 
C. Howarp DuNcAN 

Creole Petroleum Corporation 
D. B. DuNcAN 

Virginia Polytechnic Institute 
Cuar.tes W. DUNNETT 

American Cyanamid Company 
Davip DuRAND 

Massachusetts Institute of Technology 








Meyer Dwass 
Northwestern University 
Rosert EISNER 
Northwestern University 
Georce L. Epncetr 
Queens University 
Hore T. Evpuirce 
United Nations 
BENJAMIN EpstTEIN 
Wayne University 
Morpecal EzeEkIeL 
United Nations 
HELEN C. FARNSWORTH 
Stanford University 
LEON FESTINGER 
University of Minnesota 
RicHarpD J. Foote 
Bureau of Agricultural Economics 
LesTER R. FRANKEL 
Alfred Politz Research, Inc. 
MILTON FRIEDMAN 
University of Chicago 
IRWIN FRIEND 
University of Pennsylvania 
WALTER GALENSON 
Harvard University 
Erwin A. GAuUMNITZ 
University of Wisconsin 
Davin V. Guass 
University of London 
Leo A. GoopMAN 
University of Chicago 
EvuGENE GRANT 
Stanford University 
Frank E. Grusss 
Aberdeen Proving Grounds 
P. M. Grunpy 
Rothamsted Experiment Station 
MARGARET GURNEY 
Bureau of the Census 
MarGcaret J. Hacoop 
Bureau of Agricultural Economics 
MarGaRET HAMILL 
Ordnance Corps 
Morris H. Hansen 
Bureau of the Census 
ARNOLD HARBERGER 
University of Chicago 
Harry H. Harman 
The RAND Corporation 
H. O. Hartiey 
Iowa State College 
Miuiarpv W. Hastay 
National Bureau of Economic Re- 
search 
Wiuus L. Hasty ‘ 
Camp Detrick, Maryland 
J. L. Hopags, Jr. 
University of California (Berkeley) 
WassiLy Horrrpina 
University of North Carolina 





Rosert Hooke 
Princeton University 
DaniEt G. Horvitz 
University of Pittsburgh 
Ear. FE. HouseEMAN 
Bureau of Agricultural Economics 
Harry M. HuGHes 
University of California (Berkeley) 
RHEEM F. JARRETT 
University of California (Berkeley) 
GEORGE Jaszi 
Department of Commerce 
Emit H. Jess 
Iowa State College 
R. J. JESSEN 
Towa State College 
Oscar KEMPTHORNE 
Iowa State College 
Davip KENDALL 
University of Oxford 
Maurice G. KENDALL 
University of London 
Frank L. KIDNER 
University of California (Berkeley) 
Jack KIEFER 
Cornell University 
Epaar P. Kina 
Eli Lilly and Company 
Duprey Kirk 
The Population Council 
Ciype V. Kiser 
Milbank Memorial Fund 
Les.ui£E KiIsH 
University of Michigan 
LAWRENCE R. KLEIN 
University of Oxf d 
Lioyp A. KNOWLER 
University of Iowa 
ALRERT R. Kocu 
Federal Reserve System 
H. S. Konyuyn 
University of California (Berkeley) 
Cari Kossack 
Purdue University 
Wiuiiam H. Kruskau 
University of Chicago 
ERNEST KuRNOW 
New York University 
STANLEY LEBERGOTT 
Bureau of the Budget 
Ivan M. LEE 
University of California (Berkeley) 
J. M. LeticHe 
University of California (Berkeley) 
JoserH Lev a 
New York State Department of Civil 
Service 
GERALD J. LIEBERMAN 
Stanford University 
Ju.tius LIEBLEIN 
National Bureau of Standards 





ApranaM M. LILIENFELD ; 
Roswell Park Memorial Institute 
E. F. LiInDQuIST 
University of Iowa 
RicHarD LINK f 
Sandia Corporation 
SEBASTIAN LITTAUER 
Columbia University 
CLARENCE E. Lona 
Johns Hopkins University 
E 


. LoRD 
British Cotton Industry Research 
Board 
EvcenE LuxKacs 
National Bureau of Standards 
SHERMAN J. MAISEL 
University of California (Berkeley) 
Joun MANDEL 
National Bureau of Standards 
Eui S. Marks 
National Opinion Research Center 
Frank J. Massey, Jr. 
University of Oregon 
W. ParKER Mavu.LDIN 
Bureau of the Census 
Quinn McNEMAR 
Stanford University 
Perry MEYERS 
Perry Meyers, Inc. 
Don I. MirTTELMAN 
Diamond Ordnance Fuze Laboratory 
GeorrrEY H. Moore 
National Bureau of Economic Re- 
search 
P. G. Moore 
University of London 
P. A. P. Moran 
Australian National University 
J. E. Morton 
National Science Foundation 
Jack MosHMAN 
Bell Telephone Laboratory 
FREDERICK MOSTELLER 
Harvard University 
K. R. Narr 
Forest Research Institute 
Joun NETER 
Syracuse University 
GortrRieD NoOETHER 
Boston University 
Epwarp B. OLps 
Social Planning Council of St. Louis 
Epwin G. OLps 
Carnegie Institute of Technology 
Guiapys PALMER 
Wharton School of Finance and Com- 
merce 
WaLTeR PALMER 
Young and Rubicam, Ltd. 
EpwaRD PAULSON 
Queens College (N.Y.) 


EvuceEne W. PIKke 

Raytheon Manufacturing Company 
Irvin POLLIN 

Ordnance Corps 
W. H. Poote 

Young and Rubicam, Lid. 
JOHN PRatTT 

University of Chicago 
JosEPH PUTTER 

University of California (Berkeley) 
Howarp RaIFFA 

Columbia University 
STaNLEY REITER 

Purdue University 
Paut R. RipER 

Wright Patterson Air Force Base 
JoHN RioRDAN 

Bell Telephone Laboratories 
Murray ROosENBLATT 

University of Chicago 
G1pEoN RosENBLUTH 

Queen’s University (Canada) 
ALAN Ross 

Iowa State College 
H. L. Roypen 

Stanford University 
RicHARD RUGGLES 

Yale University 
LEONARD J. SAVAGE 

University of Chicago 
Henry ScHEFFrfé 

University of California (Berkeley) 
Harry ScHWARTzZ 

The New York Times 
Jacon 8S. SIEGEL 

Bureau of the Census 
HERBERT A. SIMON 

Carnegie Institute of Technology 
RosEDITH SITGREAVES 

Columbia University 
MILTON SOBEL 

Bell Telephone Laboratories 
Rosert M. Sotow 

Massachusetts Institute of Technology 
Paut N. SOMERVILLE 

Virginia Polytechnic Institute 
R. Cray SpROWLS 

University of California (Los An- 

geles) 

Wii A. Spurr 

Stanford University 
JOSEPH STEINBERG 

Bureau of the Census 
J. 8. Stock 

Alfred Politz Research, Inc. 

R. N. Stone 


University of Cambridge 
ALAN STUART 

London School of Economics 
G. H. Symonps 

Esso Standard Oil Company 





VicToR VON SZELISKI 

U. S. Economics Corporation 
CoNRAD TAEUBER 

Bureau of the Census 
D. TEICHROEW 


National Cash Register Company 


BENJAMIN J. TEPPING 

Bureau of the Census 
Donovan J. THompson 

University of Pittsburgh 
Dona.p R. Truax 

Stanford University 
JoHN W. TUKEY 

Princeton University 
Davip VALINSKY 

City College of New York 
Rupert VANCE 

University of North Carolina 


RuTLEDGE VINING 
University of Virginia 
Howarp E. WAHLERT 
New York University 
Davip L. WALLACE 
University of Chicago 
Joun E. Watsu 
Lockheed Aircraft Corporation 
LIONEL WEIss 
University of Virginia 
L. WELcH 
University of Leeds 
P. K. WHELPTON 
Miami University 
Huau H. Wo.renDEN 
Minneapolis, Minnesota 
F. YATEs 
Rothamsted Experiment Station 








JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Number 271 SEPTEMBER 1955 Volume 50 


STATISTICS AND OBJECTIVE ECONOMICS* 


Morris A. CopELAND 
Cornell University 


NE could argue that the history of economic thought has long been 
QO shaped—at least in part—by statistics. In the nineteenth century 
value theory was strongly emphasized, the theory of production rather 
slighted; at that time price statistics were far more plentiful than pro- 
duction statistics. Again laissez-faire was bolstered by a price, wage, 
and interest theory that largely ignored the government as a customer, 
as an employer, and as a borrower; at that time the government bud- 
get was small in relation to national income. Further, international 
economics emerged as a special field; international trade statistics and 
foreign exchange rates loomed large among our earlier economic time 
series. 

This is by no means all there is to the earlier influences of statistics 
on economics. But the major statistical impact has come since World 
War I, and it is that impact we are concerned with here. 

Let me begin by considering the nature of the stimulus that has been 
applied to economics. Really there are two stimuli. On the one hand, 
there is statistics as a scientific method of making observations and 
drawing inferences, particularly a method of inferring the characteris- 
tics of a population from the characteristics of a sample which is in 
some sense random. Modern statistical method is a widely applicable 
technique of scientific investigation that embraces the planning of the 
pattern of observation as well as the mathematical logic of inductive 
inference. | 

On the other hand, we have statistics in the sense of the facts of 
observation themselves, or rather, since we are here concerned with 
economic statistics, the facts which emerge from what is called statis- 





* An earlier version of this paper was presented on May 24, 1954 at a symposium sponsored by the 
American Statistical Association, New York Area Chapter, in connection with the Bicentennial Anni- 
versary of Columbia University. 


639 





640 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


tical collection and compilation. Statistical collection and compilation 
necessarily goes farther than mere observation; it involves logical in- 
ference and trained judgment. Moreover, this kind of fact finding is 
commonly beyond the capacity of any one individual. It takes an 
organization to find the population of the United States, and more than 
one organization to find the gross national product. Yet such findings 
of quantitative facts are for the economist the nearest available ana- 
logues to the measurements of physical science. Possibly they are much 
closer analogues than has sometimes been supposed, despite their 
inferential nature. At all events economic statistics are the empirical 
measurements the economist has to work with. 

Since World War I there has been a very great increase in the stock 
of economic measurements economists have at their disposal. There 
has also been a very great development in statistical method. But the 
impact of statistics or economics has come almost entirely from the 
measurement side; the greatly improved mathematical-statistical 
method has made but little impression on the recent course of economic 
thought. 

The reason why statistical meusurements have exerted a substantial 
influence is not far to seek. Economists had long aspired to make their 
subject a genuine science. For a time they had looked to Newtonian 
mechanics as an example of what a science ought to be. The aim was a 
mechanics of the market place, but the result turned out to be more 
like Euclid than Newton. Presently there were some that argued that 
geometry, rather than mechanics, was the proper model for economic 
science. But there were others who regarded the so-called laws of 
economics as too static, too deductive, and too remote from the real 
world, and who yearned to make economics an objective empirical study 
of our actual economy. 

Wesley Mitchell was a great leader in this latter group. In addressing 
the American Economic Association as its President, he outlined what 
he thought quantitative analysis, aided by “the increase of statistical 
data, the improvement of statistical technique, and the endowment of 
social research”, might do to—and for—economics. He predicted that 
“the men now entering upon careers of research may go far toward 
establishing economics as a quantitative science”. And he anticipated 
that moving in this direction would require “a recasting of the old 
problems into new forms amenable to statistical attack”. Further, 
contrasting the Newtonian or mechanical and the statistical concep- 
tions of nature, he said the latter “may be expected to make more radi- 
cal changes in economics than it makes in physical theory”. He com- 





STATISTICS AND OBJECTIVE ECONOMICS 641 


mented too on the prospect for the traditional type of economics which 
he characterized as “deductive” and as involving “excursions into the 
subjective”. He thought it “unlikely that the quantitative workers 
will retain a keen interest in imaginary individuals coming to imaginary 
markets with ready-made scales of bid and offer prices. Their theories 
will probably be theories about the relationships among the variables 
which measure objective processes. There is little likelihood that the 
old explanations will be refuted . . . much likelihood that they will... 
drop out of sight in the work of the quantitative analyst”.! Mitchell’s 
address was delivered in 1924. 

Before World War I there had not been very much in the way of 
statistical measurements on either side of the Atlantic to implement 
the kind of objective quantitative inquiry Mitchell expected to see 
develop. And in many respects American economists had been less 
well off than their European colleagues. To be sure our Statistical Ab- 
stract was a substantial octavo volume, and it was literally then, as it 
is today, only an abstract. But the quantity of statistical measure- 
ments is not a matter of weight and tale alone. They must be & propos. 

Economic statistics, like other measurements, are by nature com- 
parative. A single measurement can be quite useless; usually you need 
other measurements with which to compare it. In the case of economic 
statistics there is one type of comparison that has long been recognized 
as particularly important—comparisons in time. Economic time series, 
such as the series on our wheat crop each year, are a particularly useful 
class of economic statistics. And within this class there is one subclase 
that possesses far greater utility than the others—economic tims 
series that are on a current quarterly or more frequent basis. This 
kind of measurement is essential for analyzing the current business 
situation. Before World War I we had only a thin, scattered assort- 
ment of such series, e.g., imports and exports, foreign exchange rates, 
stock, bond, and wholesale commodity prices, bank clearings in leading 
centers, freight-ton-miles, business failures, pig-iron production.? There 
were no comprehensive current figures on employment and unemploy- 
ment, on inventories, on construction, on retail trade or retail prices, 
on bank credit. The idea of an index of physical production was widely 
deemed not feasible. Monthly measures of total personal income would 
have been thought an idle dream. The gross national product was 
only a theoretical concept mentioned by Adam Smith and then largely 





1 American Economic Review, March 1925, 1-12. There are various other quotations from this ad- 
dress below, 

2 See Historical Statistics of the United States, 1789-1945, Washington: Byreau of the Census, 1949, 
Appendix I. 





642 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1985 


forgotten. It had not yet been given even an annual or occasional 
statistical significance. 

This listing of gaps in the current quarterly and monthly time series 
available before World War I makes it clear that our area of ignorance 
was much larger than our area of knowledge. In terms of annual, de- 
cennial, and other noncurrent compilations we were considerably better 
off. Such compilations covered agriculture, mining, manufacturing, 
railroads, banks, government, and several smaller industry sectors. 
In the decade 1904-13 about three fifths of our national income orig- 
inated in these sectors. Moreover, there were clues to what was going 
on in the others, notably the clues in the decennial census of occupa- 
tions. But it should be added that the statistical record of our pre- 
World-War-I economy is substantially more complete today than it 
was in 1918. A good deal of what we now know has been pieced together 
in the meantime. Piecing together is an essential part of the work of 
statistical fact finding, and a part that has come to be performed far 
more adequately in the last thirty odd years. 

This building up of the retrospect has meant a significant addition 
to the stock of statistical measurements at our disposal. But most 
economists would rate as far more significant what has been accom- 
plished by way of building up the current picture. With regard to the 
magnitude of this accomplishment it can be said that we now have what 
we lacked before, an effective working stock of current time series, a 
far better one than that possessed by any other country in the free 
world. There are still gaps to be filled, and there are shaky figures 
that need to be made firm. But it is a sufficiently well rounded and re- 
liable stock to make effective empirical economic analysis possible. 

The measurements of physical science are mostly ad hoc measure- 
ments. When an investigator needs a measurement he finds a way to 
make it. Scientific curiosity is in general the motive that provides such 
measurements. Scientific curiosity has played a part, too, in making 
our stock of economic measurements what it is today. But its role in 
this connection has necessarily been a modest one. I have noted that the 
retrospect has been improved by a process of piecing data together. 
This piecing-together process is responsible, also, for a vast improve- 
ment in our current statistical picture. Without it that picture would 
still be a very spotty one. And scientific curiosity has been a main mo- 
tive in the development of the piecing-together techniques. However, 
without the basic current and less frequent periodic reporting services 
that have come into being since World War I, there would not be much 
that could be pieced together. And scientific curiosity has hardly been 




















































































STATISTICS AND OBJECTIVE ECONOMICS 643 


the main promoter of these services. Probably the most that can be 
said is that it has abetted curiosity-with-an-axe-to-grind. But even 
curiosity-with-an-axe-to-grind has played a modest role. It is true 
that ad hoc collection services provide a substantial part of our basic 
data. The various censuses, the monthly payroll and employment re- 
ports to the Bureau of Labor Statistics, and current trade association 
reports are cases in point. But there is a much larger body of basic 
data collected or recorded for some administrative purpose. For ex- 
ample, there are tax returns and the accounting records of businesses 
and governments. In the vast assortment of basic data we rely on to- 
day the statistics that are by-products of administrative record keeping 
and administrative reports bulk far larger than do those that result 
from ad hoc collections. 

We will not attempt to explain how our stock of by-product measure- 
ments has come to be what it is today. Such an inquiry, however in- 
triguing, would be a digression. But the comments just made on the 
role of scientific curiosity in adding to our stock of measurements have 
been offered because they help to answer a directly pertinent question 
that has surely occurred to you: If the economic measurements ac- 
cumulated since World War I have exerted a substantial influence on 
the course of economic thought, because there was a group of econo- 
mists who yearned to make their subject an empirical science, how 
comes it that the greatly improved mathematical-statistical method, 
which now offers to all fields of science a plan of inductive inquiry, has 
made but little impression? 

Let us consider this question under two heads, one relating to the 
problems of the accuracy of economic measurements; the other, to the 
problems of constructing economic models. 

If we think of statistics as a genus of quantitative facts, it would 
seem that economic statistics should be regarded as a species of that 
genus. Now the proponents of modern statistical method regard the 
members of the genus as numerical characterizations of what they call 
a population or universe, or of a sample of a population or universe. 
Presumably the figure $365 billion for our gross national product in 
1953 is a numerical characterization of a particular universe, the 1953 
national product, and since to some extent sample data were used in 
arriving at this figure, it is natural for the proponents of statistical 
method to ask, “Why not use the theory of sampling to provide a 
measure of its accuracy?” 

Certainly it is not the practice to say that GNP was $365 billion 
+2%. In fact national income figures were often published in this 





644 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1935 


form thirty-odd years ago, but the practice has gone out. Most eco- 
nomic statistics are now published without any attempt to specify the 
error quantitatively.* There are exceptions to this rule, of course. But 
the rule is against such a specification. 

The ground for this rule is surely not that the degree of accuracy 
attained by economic statistics is so high—dquite the contrary. One 
reason for the rule is that the errors modern mathematical statistical 
method has enabled us to measure are not the only errors to which eco- 
nomic statistics are subject. Often they are relatively unimportant. 
For one thing there are sampling errors that have a time dimension 
that is not yet covered by probability theory—I mean errors which 
arise because a sample-universe relationship derived from study of a 
benchmark year is applied to sample data for subsequent periods, and 
in these subsequent periods the sample may have gotten out of line 
with the universe to an extent not reliably measured by any available 
statistic. There are conceptual errors too—thus the articulation of the 
parts of a total like GNP may not be quite correctly designed. The 
articulation of the parts may be imperfect for this reason: It may be 
imperfect also because the basic data do not fully conform to standard 
specifications, e.g., the basic data used in putting together a consoli- 
dated balance sheet for our banking system may not all refer to exactly 
the same date; or they may not all define foreign banks in the same 
way. Further, there is always the possibility of sheer mechanical 
errors. And of course there are incomplete and doubtful data that must 
be used in some of the steps taken in arriving at a comprehensive total 
like GNP. Again, the data available are continually changing—prob- 
ably no two successive annual estimates for a total like GNP are 
exactly comparable. This is not a complete catalogue of sources of 
error, but it is perhaps enough to indicate why attaching a single 
percentage error measure derived from a probability calculation to 
many current economic statistics would only be misleading. Instead, 
in the case of time series, the prevailing practice is to mark some 
figures p and others r. 

But, as it applies to time series, there is another reason for the rule 
against showing the percentage error. The main interest here centers 
not on one absolute error but on various relative errors. What counts 
is not so much the absolute level of a comprehensive total like dispos- 
able personal income but its year-to-year and quarter-to-quarter move- 
ments and its relations to other totals, e.g., personal consumption ex- 





* Thus none of the current monthly figures regularly appearing in the Survey of Current Business 
is accompanied by such a specification, although many of them are based on sample data. 


el ee ee, ee ee |e 





STATISTICS AND OBJECTIVE ECONOMICS 645 


penditure and personal saving. Attaching a separate percentage error 
figure to each quarterly estimate of disposable income would not be a 
satisfactory way to indicate the accuracy of either quarter-to-quarter or 
year-to-year movements in this series. Nor would separate percentage 
error figures attached to both disposable income and consumption esti- 
mates be too helpful in appraising the accuracy of a measure of the 
relation between them. 

These comments indicate why the scientific error measuring tech- 
nique that has been elaborated by modern mathematical statistical 
method has not made much impression in the field of economic statis- 
tics. They also have an affirmative implication that deserves attention 
here. Every basic compilation of statistical data and every set of pieced- 
together economic measurements ought to be accompanied by an 
adequate descriptive statement. Such a statement is the best available 
substitute for a quantitative appraisal of errors; but it should serve a 
still more important purpose too. It is needed to tell the user exactly 
what the figures mean. 

This affirmative implication has two edges—it asserts an obligation 
on the producers of statistics, and a corresponding obligation on those 
who use them in economic analysis. 

The producer has an obligation to provide an adequate description 
of his product. In the case of those basic data that are by-products of 
administrative records or reports this obligation may rest on the 
processor, if the primary producer has not met it properly. And with 
respect to the processing or piecing-together, the obligation should 
read like this: Specify what you have done so fully that you could 
expect others to repeat the process and come out with substantially 
the same findings. If economists want to aim at something like the kind 
of objectivity in their measurements that attaches to the measure- 
ments of physical and biological science, this is surely the sort of stand- 
ard they should set for themselves. But it suggests a material qualifica- 
tion on what has been said above about our progress in developing 
economic measurements. We now have a lot of measurements but with 
regard to many of them, if we are candid, we must admit that we do 
not know enough about them to reproduce them. Undoubtedly there 
has been substantial progress in statistical specification statements, 
but to conform to the objectivity canon proposed there is still a long 
way to go. And there are pressures that work against going that way 
very rapidly. The preparation of a statistical specification statement is 
a tedious job that cannot well be delegated to a clerk. Time and money 
can be saved by slighting it and there are both time and money pres- 





646 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


sures. Only a fraction of the statistical public will object if the specifica- 
tion is delayed a few years, or if it is quite brief and vague. Very few 
will object if they cannot repeat the process and confirm the results, 
However, those few do something to bring about the correction of 
errors in economic time series. And they exert some pressure—perhaps 
disproportionate to their numbers—toward better specifications. 

The other edge to the implication from my comments on the nature 
of the errors in economic measurements relates to the users. Those who 
use such measurements in economic analysis have an obligation to 
understand the meaning of the figures they use, i.e., to know what they 
are doing. No doubt this statement will strike many non-economists as 
an unnecessary amplification of the obvious. But, unfortunately, a 
kind of division of labor has grown up under which some people make 
it their business to know in detail how concepts like GNP, personal in- 
come, and personal saving are defined, and to play a part in the piecing- 
together process of constructing economic measurements; others prefer 
to concentrate on exploring the relationships among these economic 
variables. With such a division of labor, one can contemplate the possi- 
bility that Mr. X may find a new and better formula for predicting 
what is currently called “personal saving” from “disposable income” 
and yet not realize that what he has learned to predict is really not 
personal saving but a mixture of household saving, institutional saving 
and additions to noncorporate business surplus, a mixture that does 
not necessarily reflect all of household saving. It can be argued that 
such an unhappy possibility is a necessary cost of progress in economic 
inquiry; both the constructing of statistical measurements and eco- 
nomic model building can be very time-consuming occupations. But the 
validity of this argument is open to question. Economic model building 
is still a kind of job that can be done by people who are not full time 
specialists. I think those who concentrate on analyzing relationships 
among variables whose meanings they have not really stopped to in- 
vestigate are unnecessarily impatient to get results. They may get 
results by impatience; but for the longer pull the obligation to know 
about the figures they are using remains. 

The other part of the question posed above as to why the recent de- 
velopments in mathematical statistical method have exerted but little 
influence on the course of economic thought relates to model-building. 
Present-day statistical method offers a kind of guidebook to model- 
builders. Why hasn’t this guidebook done more to shape the course of 
economic model-building? One reason is quite simple. The guidebook 
proposes a plan for taking observations. It tells how to sample a uni- 





STATISTICS AND OBJECTIVE ECONOMICS 647 


verse so as to get a reliable model for it. Economic measurements are 
the quantitative observations the economist wants to get, and he 
particularly wants time series. What he wants is not a sample of a time 
series, but the whole series for a period of years, if he can manage to 
get it. Often he would like it for a longer period than circumstances 
permit. In taking time-series mesurements he takes all he can get. The 
guidebook is no help. It is true there are areas of economic investiga- 
tion where ad hoc observations can advantageously be made and a plan 
of observation is needed. It is true, too, that there are other fields of 
inquiry in which the possibilities of ad hoc observation are more nar- 
rowly confined than in economics, e.g., history and paleontology. None 
the less, the part of economics that is not helped by a mathematical 
statistical plan for taking observations is a substantial part of the 
whole. 

But there is a second and more fundamental reason for the somewhat 
limited usefulness of the new mathematical-statistical guidebook in 
economic model building. This reason involves both the nature of the 
guidebook and the nature of economics. The guidebook aims at general 
applicability to all fields of inquiry and general applicability implies 
that the problems of the various fields—say physics, biology, and 
economics—conform closely toa single pattern. We contend that they 
donot, and that the guidebook, since it has been designed mainly for the 
physical and biological sciences, is not particularly well designed to 
serve the purposes of the economist. The economist’s inquiry problems 
differ from those of the physicist and biologist because of two special 
characteristics of many of the time series with which he has to work. 
These two characteristics apply peculiarly to social and economic statis- 
tics. They are of special concern to the economist, because economic 
time series constitute the vast bulk of social and economic time series. 
It will be convenient to refer to series in which the two special char- 
acteristics are prominent as one-culture time series. 

Per capita disposable income in the United States at 1947 prices, 
1929-53, may be taken as an illustration of a one-culture time series. 
Let us contrast this type of series with such a series as total annual 
precipitation at Chicago, 1929-53. The first special characteristic that 
distinguishes the former is that the geographical specification is more 
than a mere geographical specification; it is a culture specification as 
well. And culture specification is important, because cultural differ- 
ences can pose major obstacles to making interspatial comparisons. 
Thus while there are doubtless serious difficulties in saying what per 
capita income in the United States in 1929 would be equal to $1500 





648 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1935 


in 1950, such a comparison seems quite safe and simple when we face 
the problem of trying to say how many dollars of per capita income in 
the United States in 1950 would be equal to a per capita income of 
£200 in the United Kingdom in that year. But cultural differences do 
not hamper the making of interspatial comparisons of meteorological 
or other physical measurements. Because the geographical specification 
is partly a culture specification in the case of many economic measure- 
ments, and because one-culture intertemporal comparisons of such 
measurements are often so much easier and safer than intercountry 
comparisons, one-culture time series are today the outstandingly im- 
portant category of economic measurements. Hence, too, economic 
model builders—to the extent that they have operated empirically— 
have inevitably devoted most of their attention to time-series or period 
analysis models. 

The second special characteristic of a one-culture time series is that 
its meaning may be gradually changing as the culture, of which it is 
an aspect, evolves. So, too, if we think of a set of one-culture time series 
as a set of variables, the relations among these variables may be grad- 
ually changing. Because of this characteristic the economic model- 
builder faces a theoretical dilemma. Theoretically he can attempt to 
develop a period analysis model that will describe the course of cultural 
evolution. Aiternatively he can assume that the evolutionary process is 
sufficiently gradual so that he can get useful results if he ignores it. 
The first alternative offers no prospect of early success; it is strictly a 
theoretic=l alternative in the present state of our understanding. 

The second and only practical alternative today is rather awkward. 
The economic model builder necessarily works with time series covering 
a finite period ard fits his model to the observations for that period. 
He would like to assume that these observations constitute a repre- 
sentative sample of a longer period, and to use his model to draw in- 
ferences about that longer period. But because he is following the 
second alternative he cannot get out of the past a random sample of a 
period that includes the future. He cannot use the methodological 
guidebook of mathematical statistics to appraise the validity of such 
inferences. Instead he adopts a rule of thumb to determine the confi- 
dence with which a stable relationship among variables during the 
period for which observations are available can be used to draw infer- 
ences about a longer period. The rule is that confidence diminishes as 
the length of the extrapolation period increases. 

In view of the gradual-evolution characteristic of one-culture time 
series the rules of mathematical statistical inference can not be applied 





STATISTICS AND OBJECTIVE ECONOMICS 649 


to extrapolations. But the economic model-builder can still use various 
statistical techniques much as if they did. He can and does. In particu- 
lar he uses mathematical best-fit techniques to determine his parame- 
ters and on occasion he postulates probability distributions for his 
error terms and makes his parameters functions of time. On the other 
hand there are many rather mundane techniques employed by statis- 
tically minded economists to the development of which mathematical- 
statistical theory has made negligible contributions. Some of these are 
used in the piecing together process of statistical fact-finding and in 
designing and operating index numbers (e.g., splicing); others in 
analyzing time series variations (e.g., calendar adjustments). Such 
mundane techniques may well have done rather more for scientific 
method in economics than the mathematical-statistical guidebook. 

The restrictions on the applicability of mathematical statistical 
method we have been considering relate to time-series analyses. With 
respect to cross-section analyses I shall stop only to say that the use- 
fulness of mathematical statistical method varies with the problem. 
There are problems for which it seems ideally designed. There are 
problems for which its applicability is even more seriously restricted 
than for those connected with time series. And there are many prob- 
lems in between. 

We have attributed the very limited impact of recent improvements 
in mathematical-statistical method on the course of economic thought 
to the fact that that method is not well adapted to the problems of 
inquiry in the field of economics. The direct impact would, in any case, 
be limited to those problems which lend themselves to statistical treat- 
ment. Even within this area it is further restricted because there is a 
group of economists engaged mainly in statistical-economic research, 
who feel modern mathematical-statistical techniques to be malapropos 
and make a point of avoiding their use. Still, so long as there is a sub- 
stantial group who do use them, one might well look for a significant 
impact. 

There is, indeed, at least an incipient change in economic thought 
that can properly be attributed primarily to statistical theory. In the 
older—still widely held—deductive, subjective conception of an 
economic model the endogenous variables were all uniquely determined. 
With the error terms that characterize the statistically fitted equations 
of empirical model analyses some measure of indeterminacy is begin- 
ning to replace this older absolute determinism. 

So much for the slight impact of statistical method on the course of 
economic thought. What has been the nature and extent of the impact 





650 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


of statistical measurements? Have they given economics the sort of 
objectivity Mitchell had in mind? In considering this question I must 
in fairness note that I omitted some of the qualifications he attached 
to his forecast. One of them suggests a kind of obduracy in the de- 
ductive, subjective approach. He expected “quantitative analysis (to) 
produce radical changes in economic theory” but he warned us that it 
“does not promise a speedy ending of the types of economic theory to 
which we are accustomed”. We have had radical changes; the “excur- 
sions into the subjective” continue, although the rationalizations it 
is their purpose to provide have no proper place in the theoretical 
framework of any true behavioral science. 

Following Keynes the problems of economic theory have been recast. 
The field is now divided into two main parts, macroeconomics and 
microeconomics. In macroeconomics, quantitative analysis has made 
great progress. Quantitative workers now have at their disposal an 
impressive array of basic economic concepts which have something 
like the objectivity that attaches to the concepts of physics, in that 
they are defined operationally, ie., in terms of the way they are 
measured. Among these are production (i.e., deflated GNP), wealth, 
personal income, personal consumption expenditure, gross private 
domestic capital formation, money (i.e., the total currency and deposit 
liabilities of the banking sector to other sectors), bank credit, the whole- 
sale price level, unemployment, the labor force. These operationally 
defined macroeconomic concepts have given economists a set of ob- 
jective variables whose behavior they can investigate empirically. 

Quantitative workers conceive the task of macroeconomic theory to 
be to develop equations which relate these economic variables; to use 
these equations as well as a vast amount of detailed facts, to analyze 
the past behavior of our actual economy as it is reflected in changes in 
the aggregative variables; and, on the basis of such analyses to form 
opinions about the future. 

Consider one of these equations, the consumption function. This 
equation is not as yet a very perfect instrument; while a number of 
improvements have been made in it in the past decade, much work still 
remains to be done. In contrasting the mechanical and the statistical 
conceptions of nature, Mitchell emphasized “imperfect approximation’ 
as characterizing the latter. The consumption function is unmistakably 
an imperfect approximation. But quantitative analysts have found it 
very useful nonetheless. It is what is called a behavioristic equation, 
i.e., it is a statistical best fit to the available measurements. Historically 





4 It would of course be scientifically appropriate to investigate, on the basis of an operational def- 
nition of rational behavior, how much of human behavior is rational. 





DR 1955 


ort of 
must 
ached 
1e de- 
s (to) 
hat it 
ry to 
‘xcur- 
ms it 
etical 


cast. 
3 and 
made 
al an 
thing 
that 
y are 
2alth, 
ivate 
posit 
hole- 
nally 
f ob- 
ly. 

ry to 
O use 
alyze 
res in 
form 


This 
er of 
: still 
stical 
tion” 
cably 
nd it 
tion, 
cally 


— 


al defi- 


STATISTICS AND OBJECTIVE ECONOMICS 651 


it is interesting that when Keynes proposed the consumption function 
he indulged in subjective analysis; it is interesting, too, that this kind 
of rationalization of consumer behavior has dropped “out of sight in 
the work of the quantitative analysts” on improving the consump- 
tion function and that these quantitative analysts have introduced 
lags and stickiness factors commonly slighted by those who stress sub- 
jective rationalizations. But these quantitative analysts do not have 
the field to themselves. And their neglect of the subjective has not gone 
without protest. 

Many a macro-economic theorist who has been accustomed to de- 
fining his basic concepts in terms of individual choices and expecta- 
tions, has continued to prefer such subjective definitions to objective, 
operational ones. Quite possibly he feels the latter to be lacking in 
logical precision; certainly they smack of “imperfect approximation”. 
And the shift from subjective to operational definitions would require 
a fundamental change in habits of thought that is not easily made ex- 
cept when one is young. 

Those who prefer subjective definitions in general prefer a corre- 
sponding type of economic analysis. Thus there is today a considerable 
group of economists who see no merit in a purely behavioristic equa- 
tion like the consumption function. They argue that there are “in 
fact two concepts of propensity to consume”. One they characterize as 
“formal”, “aggregate”, and “ex post”.* This is the one we have been 
considering. Though they call it ex post it has been used in making 
projections into the future. It might, therefore, be better to call it 
objective. The other concept is not amenable to statistical investiga- 
tion. The economists who insist on it distinguish it as “psychological”, 
“individual”, and “ex ante”. They tend to deprecate the objective, 
statistically determined propensity as “the tautological concept of the 
marginal propensity to consume”,® and to say that a theory of business 
cycles that depends on it reduces to “a definitional proposition of no 
significance”.? There is also a slightly milder condemnation of this 
kind of quantitative cyclical analysis which asserts that it tends “to 
submerge the process uf economic change” in static or “instantaneous 
pictures” .® 

These reactions are of course reactions to Keynesian model analysis. 





’ Haberler, Gotfried, “Mr. Keynes’ theory of the ‘multiplier’’, reprinted in Readings in Business 
Cycle Theory, Philadelphia: Blakiston, 1944, 193-202. 

6 Machlup, Fritz, “Period analysis and the multiplier theory”, reprinted in Readings in Business 
Cycle Theory, Philadelphia: Blakiston, 1944, 203-18. 

7 Fellner, William, “Employment Theory and Business Cycles”, Section 6, in A Survey of Contem- 
porary Economics, Philadelphia: Blakiston, 1948, 53-5. 

§ Williams, John H., “An appraisal of Keynesian economics”, American Economic Review, Supple- 
ment, May, 1948, 288-9. 





652 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1985 


Those who have labored to construct and to improve a statistical model 
along Keynesian lines are quantitative analysts. And the criticisms 
of this type of analysis just cited are in effect criticisms of any purely 
behavioristic analysis, Keynesian or otherwise. Those who bring the 
charges of tautology and static pictures appear to hold that a theory 
of the cycle must be something more than an hypothesis that approxi- 
mately describes or fits the observed facts and that can be used in 
making projections; that it must indeed explain or rationalize what hap- 
pens in terms of hypothetical individual expectations and preferences, 
On this ground Mitchell’s own hypothesis for the cycle was long ago 
crititicized as not a theory. Of course this was mere name-calling; not 
an intellectual criticism. But the recent critics of behavioristic quan- 
titative economic analysis would have been better advised to say “not 
a theory” than to say “tautology” or “static”. They are vulnerable on 
the truism count themselves. Besides, the charges of tautology and 
statics cannot be sustained against a Keynesian statistical model. 
Whatever else may be said about this type of model it is not a defi- 
nitional proposition devoid of empirical significance and not a static 
picture. No tautology could possibly lead logically to a wrong factual 
conclusion as did the model which produced the VJ Day forecast; nor 
could a static model. It takes a period analysis model to produce a 
prediction. 

Perhaps these comments on behavioristic quantitative analysis and 
its critics suggest that economic theorists can be classified as either 
behaviorists or subjectivists; they cannot. There are not a great many 
quantitiative analysts today who are—as Mitchell thought they some 
day would be—“chary of deserting the firm ground of measurable 
phenomena for excursions into the subjective”. And plenty of those who 
insist the role of hypothetical individual expectations and preferences 
in economic analysis is a basic one, themselves engage extensively in 
the work of analyzing objective economic measurements. 

I have used the statistical consumption function as an example of 
a behavioristic equation. But the class of aggregative equations it 
illustrates is not yet very large. There are, of course, a considerable 
number of current input equations in the Leontief model. Also there 
are the import function (a kind of step-child of the consumption func- 
tion); the personal income size distribution pattern (the equation for- 
merly used to state this pattern was called Pareto’s Law); and the 
secular pattern in the functional distribution of national income. There 
are few others. 

These not too stable behavior patterns may be contrasted with an- 





STATISTICS AND OBJECTIVE ECONOMICS 653 


other much more stable type of equation that I propose to call concep- 
tual. Conceptual equations do not contain parameters that have been 
determined as statistical best fits as do behavioristic equations; and 
most such equations can be applied to future periods with great con- 
fidence. 

More economists have disparaged conceptual equations than have 
disparaged the consumption function. It is because they have used 
the designation “definitional” in this connection to imply tautology 
that I suggest the substitute word, conceptual. Despite disparagement, 
conceptual equations are today the mainstay of quantitative economic 
analysis. And despite the implication of tautology a number of them 
ars not quite true,—each of these imperfect equations contains an 
error term much as does the consumption function. There are three 
broad classes of conceptual equations: those that assert that a whole 
equals the sum of its parts; those that assert that product equals 
multiplicand times multiplier; and those that assert a balance of debits 
and credits. 

The second kind of conceptual equation is used in analyzing a dollar 
volume time series into a price-index series and a physical volume series. 
This form of factoring analysis was somewhat widely used in the 
early 1920’s; it was also somewhat widely and seriously abused. With 
the vast increase in available data and with a better dissemination of 
the necessary know-how, the use of this kind of quantitative analysis 
has been greatly extended during the past thirty years, and it has in 
general been tightened up by far more attention to detail. Abuses 
have not disappeared; but the grosser ones are now quite generally 
recognized as such. 

Before taking up the other kinds of conceptual equations, let me 
call attention to an issue—or rather a group of issues—that the in- 
creased use of statistics has injected into economics. If these issues can 
be summed up in a single question it is this, “Which should the quanti- 
tative economic analyst emphasize, his models or the complexity and 
ever changing nature of the world he is investigating?” One of the issues 
that stems from this question has already been noted, i.e., How fully 
should the quantitative analyst understand the economic measure- 
ments with which he works? Another and deeper one is, Should he 
devote his efforts to developing and improving a comprehensive 
model—say one along Keynesian lines, or a Leontief model? Alterna- 
tively, should he assume that such a schematic approach would be 
likely to impose unwise constraints on his inquiries? Or should he take 
some middle ground? I shall not stop to list other issues that stem 





654 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


from the basic question of what to emphasize in quantitative analysis, 
Suffice it to say that together they divide the quantitative workers, 
not into two distinct camps, but rather into something like a spectrum, 

Equations that assert the whole equals the sum of its parts make 
possible an obvious but basic type of economic analysis. Thus one who 
wants to understand the behavior of a total like the Federal Reserve 
index of industrial production may wish to resolve it into its various 
industry components, and may, to bring out some influence at work 
during a particular period, focus his attention on a specially designed 
grouping of the components that are sensitive to this influence. 
Economists who emphasize the complexity and changing nature of 
their subject are likely to rely heavily on this kind of quantitative 
analysis, and to feel that preoccupation with a comprehensive model 
might be a serious handicap when it comes to discovering significant 
special purpose groupings of components. Equally, those who en- 
phasize models may hold that preoccupation with ad hoc component 
analyses and a detailed historical approach is prejudicial to the dis- 
covery of stable economic behavior patterns. Probably both groups are 
right. 

Aggregative debit-credit equations are social accounting equations. 
Most of the development of this type of equation—and it has been a 
substantial one—has come in the last thirty years. Hardly anybody 
has bothered to call either component equations or factor equations 
tautological. This charge has been directed against the social account- 
ing equations. And the fact that they have been singled out in this 
way seems to be a tacit recognition of their high theoretical importance. 

There are two reasons why a social accounting equation should not 
be considered a definitional equation. First, every variable in a social 
accounting equation is either actually or potentially directly meas- 
urable, and thus has or can have an operational definition that is 
independent of the equation.® Second, a social accounting equation is 
significant theoretically because it expresses a significant economic 
adjustment. Thus the gross savings and investment account (gross 
S=gross I), against which the charge of tautology has most frequently 
been brought, reflects the equilibrating of supply and demand in the 
loan and security markets, although in the form in which it appears 
in the Department of Commerce national income and product accounts 
the supply of and demand for funds are not brought out very clearly."° 





* Admittedly when a social accounting equation is used to provide a residual estimate of a variable, 
it serves pro tem as a definitional equation. But this is not its main purpose; indeed its analytical useful- 
ness may well be increased when a direct estimate or measure of the variable is developed. 

1° The writer has discussed this point more fully in A Study of Moneyflows in the United States, New 
York: National Bureau of Economic Research 1952, especially pp. 246-60. 





cnmninheae oa @& we. 


_ a eo 


STATISTICS AND OBJECTIVE ECONOMICS 655 


Anyhow it should be obvious that this equation is not merely true by 
definition ; the error term in the 1948 account for the United States was 
5 per cent of gross investment. 

Because social accounting equations have substantial theoretical 
significance, the recent rapid growth of social accounting has exerted 
an important influence on the development of economic thought. Let 
us consider briefly how. 

For one thing, it has helped to correct a formerly somewhat prev- 
alent misconception. Before we had much in the way of social ac- 
counting measurements it was frequently supposed that aggregate 
demand might be less than or greater than gross national product. 
This misconception took various forms, but I shall cite only one of 
them, a theory of the cycle that had many adherents in the late ’20’s. 
Its authors, Messrs. Foster and Catchings, stated it in terms of what 
they called “the annual equation” or “balance of output and de- 
mand”. This is clearly a social accounting equation. But they held, 
“The year is the shortest period of time within which we may reason- 
ably hope to approach closely to a balance”. And “The annual equation 
may be upset. As a matter of fact, every recession in business activity 
is marked by this kind of overproduction or by the fear that it is 
imminent”. With the growth of national income and product statistics 
and of the practice of assigning these measurements a central place in 
aggregative analysis, economists have become cautious about suggest- 
ing any imbalance in a social account except a statistical discrepancy. 
Causal hypotheses involving a social accounting imbalance have been 
largely replaced by causal hypotheses depending on an imbalance in 
the subjective realm of hypothetical individual plans and expectations. 

But the really important influence of social accounting on economic 
thought has been the constructive one. Before we had much in the way 
of social accounting measurements, aggregative analysis was rather 
like a ship without chart and compass. Current business analysis it 
was called at the time. It lacked a central core of basic concepts around 
which inquiries could be organized. The available current time series 
were appraised as business indicators or business barometers; the 
primary considerations in the appraisal were somewhat mechanically 
determined properties—cyclical lags and leads and cyclical sensitivity. 
Various selections of series with approved properties were combined 
according to various more or less arbitrary formulas into indexes of 
business activity. During the last 20 years the system of social accounts 
has come very largely to replace these indexes. Today a major consider- 





1 Foster, William Trufant and Catchings, Waddill, Profits, New York: Houghton Mifflin Company, 
1925, 249-50. 





656 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


ation in appraising a time series is its relation to this system; the GNP 
account and the sector or demand accounts that interlock with it. The 
system portrays the economic circuit which, at least since the time of 
Quesnay and Smith, had been recognized as playing a central role in 
allocating resources and determining the composition and distribution 
of product in the more industrialized countries. The accounts have 
added precision to our understanding of the circuit, and they have 
given us measures of the main component flows of which it consists, 
Some time series report component flows, and are appraised on the 
basis of their roles in the circuit. Others, like prices and interest rates, 
are appraised in terms of the way they influence the circuit flow. Still 
others, like employment and unemployment, are regarded as resulting 
from the operation of the circuit. Thus the social accounts have helped 
to give aggregative economic analysis a sense of direction and a bal- 
anced perspective. 

Note that I say, “helped”; they have not done this all alone. A state- 
ment of an accounting balance is not a mere truism; but it is a rather 
colorless assertion, even when it asserts a balance of supply and de- 
mand. It reports a correlation; it does not, by itself, indicate causation. 
Aggregative analysis has more of a sense of direction—and a sharper 
focus, too—than mere statements of accounting balance could impart. 
This improved direction and focus are to an important extent the results 
of a wide acceptance of a causal hypothesis proposed by both Keynes 
and the Stockholm school—the hypothesis that changes in the level of 
GNP are brought about mainly by changes in aggregate demand. 
Thus time series that report components of aggregate demand have a 
central place in aggregative quantitative analysis, and interpretations 
of the government account, the S and I account, the rest of the world 
account, and the personal account are directed toward explaining the 
behavior of the four major components of aggregate demand—govern- 
ment demand, private domestic investment, net foreign demand, and 
personal consumption expenditure. Without the social accounts there 
could be no such interpretations; without the hypothesis of the general 
primacy of aggregate demand the interpretations would not focus in 
this way. 

Mitchell’s forecast suggested that “the men now entering upon 
careers of research may go far toward establishing economics as 4 
quantitative science”. Despite the lingering defects in our statistical 
specifications and despite the persistence of subjective rationalizations 
of economic behavior there can be little doubt that the advance in 
quantitative aggregative analysis that has already taken place repre- 





STATISTICS AND OBJECTIVE ECONOMICS 657 


sents a long step toward the development of a quantitative economic 
science. And the period of the forecast still has twelve or fifteen years 
to run. 

Progress has been substantial; but it has also been lopsided. Super- 
ficially, at least, micro-economic analysis seems to have been largely 
immune to the transforming influence of statistics, and empirical 
explorations of this phase of theory have not been very numerous. It is 
true something has been done toward developing statistical demand 
and cost curves; but there was rather more interest in statistical de- 
mand curves thirty years ago than there is today. And the most signifi- 
cant development in micro-economic model analysis that has occurred 
in the past-thirty-odd years, that relating to monopolistic competition, 
has not to date proven very amenable to statistical exploration. 

It is tempting to suggest a kind of cumulative disequilibrium theory 
to explain this lopsidedness. The progress of quantitative empirical 
analysis in the macro-economic field seemed to offer a prospect of more 
progress. Consequently the field attracted those economists who had 
an aptitude for quantitative analysis, and they neglected the cultiva- 
tion of micro-economics. To the extent that such a theory has merit— 
and it seems to fit the declining interest in statistical demand curves— 
the lopsidedness is a defect that time should help to cure,—perhaps is 
already in process of curing. 

But this is probably only a partial explanation of the lopsidedness. 
The push to divide economic theory along the macro-micro line came 
from persons interested in macro-economics. The division facilitated 
work in macro-economics; it was in some ways awkward for the micro- 
economist. His concern is presumably with the various parts of the 
economy rather than the whole. But most of these parts had long be- 
fore been assigned to various special fields, e.g., labor problems, agri- 
culture, transportation and public utilities, private finance, industrial 
combination and competition. And the effective application of quanti- 
tative analysis to a particular micro-economic problem is likely to 
call for detailed familiarity with one of these special fields. The bulk of 
the statistical work on micro-economic problems that has been done 
during the last three decades has been done by workers in the special 
fields. Certainly the result has been a substantial improvement in our 
understanding of the several parts of the economy, but the accomplish- 
ment here is a piecemeal affair that consists of a multitude of scattered 
bits of new knowledge. The cobweb theorem and the Federal Reserve 
elements analysis are striking, though hardly typical examples. It 
would be difficult to give a reasonable summary characterization of 





658 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10953 


what has been accomplished and I shall not attempt one. But it seems 
fair to say that the many scattered bits do not yet add up to a radical 
change comparable to that in macro-economics. 

These comments, so far as they go, attribute the contrast between 
macro-economic and micro-economic quantitative analysis to influences 
that derive from the existing division of academic labor. But I suspect 
the contrast is due partly, too, to subject matter. In the macro-eco- 
nomic field a problem has been singled out that is peculiarly amenable 
to statistical investigation, the problem of quarter-to-quarter and year- 
to-year changes in aggregate demand. The question is, “How do you 
account for the cyclical behavior of this aggregate quantity?” And a 
very large part of the answer can apparently be given in terms of the 
behavior of other economic variables, i.e., in quantitative terms. A 
very large part of the answer; but not all of it. Qualitative factors, such 
as technological or legal changes, often have to be taken into account. 
Still I think the fact that workers could afford to rely heavily on quan- 
titative analysis in investigating business cycles has been a highly 
significant contributing circumstance in the progress this type of anal- 
ysis has made. And since the progress has been lopsided, it is perti- 
nent that the circumstances are lopsided, too. In the micro-economic 
field qualitative factors loom much larger. Witness the amount of time 
a specialist in labor, in railroads, in banking, or industrial combination 
and competition devotes to a study of the law as it impinges on his 
field. If the progress of micro-quantitative analysis has not been very 
spectacular, it is partly because many of the problems of the special 
fields of economics do not lend themselves to a predominantly quantita- 
tive approach. 

Mitchell’s address was primarily a forecast of the development of 
objective quantitative analysis. But he took care to emphasize the 
complementary relationship between qualitative institutional analysis 
and quantitative analysis and the bearing of both on questions of 
policy and economic welfare. He said in part, “quantitative work can- 
not dispense with distinctions of quality. . . . Indeed qualitative work 
itself will gain in power, scope and interest as we make use of . . . more 
reliable measurements”. Out of “quantitative economics... we may 
expect to come a close scrutiny of our pecuniary institutions and our 
efficiency in producing and distributing goods. .. . Economists will 
concentrate ... to an increasing degree upon economic institutions. 
... Quantitative analysis promises... to increase the range of ob- 
jective criteria by which we judge economic welfare”. 

This portion of his forecast, to date at least, seems to have been 





mm @sas 3. 


a. Ss es 


~~ sn bli sollte “oe a—- «at 2 202 G286c¢€ oh ee lUM,lCO COCO 


STATISTICS AND OBJECTIVE ECONOMICS 659 


wrong. During the first third of the twentieth century economic theo- 
rists did a large amount of valuable work on the qualitative analysis of 
pecuniary and other economic institutions and their significance for 
economic welfare, but there has been very little of this type of inquiry 
since the appearance of Keynes’s General Theory. True, a good deal 
has been accomplished in detail in the special fields, but statistics do 
not appear to have contributed any special impetus to the accomplish- 
ment. And in the field of theory they have, if anything, helped to 
divert attention from qualitative, institutional inquiries. 

Recently, however, the outlook for this type of work has brightened. 
We now have a new division of academic-economic labor. An increasing 
number of economists are concentrating their efforts on the problems 
of economic development and on comparing the institutional structures 
of different economies. In this area of research the complementary 
relationship between qualitative, institutional analysis and quantita- 
tive analysis is particularly close. It is too early yet to appraise the 
impact of statistical measurements here for we are only beginning to 
assemble a stock of this kind of information, and the field is still very 
new. But its problems clearly call for a balanced combination of quali- 
tative and quantitative analysis and this combination may some day 
have important repercussions on general economic theory. Quite 
possibly there are to be found here the makings of a delayed response 
to the statistical stimulus somewhat along the lines Mitchell visual- 
ized." Certainly he was right when he said “qualitative analysis... 
cannot be dispensed with.” 





12 Cf. comment by Domar, Evsey D., “Methodological developments” in A Survey of Contemporary 
Economics, Vol. II, New York: 1952, 455. 





FEDERAL TRADE COMMISSION REPORT ON CHANGES 
IN CONCENTRATION IN MANUFACTURING* 


M. A. ADELMAN 
Massachusetts Institute of Technology 


TATISTICS relating to the number, size distribution, and other char- 
S acteristics of the four-mitlion-odd business firms in the United 
States are of obvious importance for public policy and the advance- 
ment of economic knowledge. The Federal Trade Commission is in 
part an original source of basic statistics, in larger part a “primary 
processor” of raw data supplied by others. In discharging both functions 
it has issued studies of concentration in the American economy. 

The last previous study was characterized in this Journal as “this 
unimpressive monograph.”! On the average, it is doubtful that any- 
thing better can be said for their latest effort; some of it is novel and 
useful, some of it is neither, and some (the most publicized portion) 
is egregiously bad. More important than any over-all judgment, how- 
ever, is a detailed examination. 

The Report deals with changes in concentration in individual (four- 
digit) industries, and concentration in “manufacturing as a whole.” 
For individual industries, the Report reprints the concentration ratios 
for 1947 compiled by the Census Bureau and published by the Celler 
Committee in 1949. For one third of all industries a comparison was 
possible with 1935. The contribution of the Report in this area consists 
of a chapter (pp. 22-9) of running comment, too brief to be useful, 
on the statistics. There is no attempt at a measure of central tendency.’ 

Changes in concentration are related to changes in other variables. 
“1G l]enerally speaking, the larger the expansion in number of plants 
in an industry, the more likely was a decline in concentration” (p. 51); 
and concentration also tended to decline in industries with rapidly in- 
creasing production (54-7). Increase in concentration of physical estab- 
lishments (plants) was accompanied by increase of concentration of 
companies (57-61). The information is presented in scatter diagrams; 





* Report of the Federal Trade Commission on Concentration in Manufacturing, Washington: Gov- 
ernment Printing Office, 1954, Pp. 153. $0.45. 

1 Stigler, George J., book review, Journal of the American Statistical Association, 46 (1951), 403-5; 
see also Kaysen, Carl, book review, Review of Economics and Statistics, 36 (1954), 107-9. 

2 Gideon Rosenbluth found that the weighted average concentration ratio, corrected for a certain 
mathematical bias, declined by two percentage points during 1935-47, owing to the faster growth of the 
less concentrated industries. On the basis of a stabilized population, that is, there would have been no 
decrease or increase. See “Measures of Concentration,” Proceedings of the Conference on Business Con- 
centration and Price Policy, Princeton: Princeton University Press in 1955, for the National Bureau of 
Economic Research, Pp. 79-83. 


660 





— mm wm cc 


A w=?! hs Or 


FEDERAL TRADE COMMISSION REPORT 661 


to this reviewer’s eye, the association appears significant in each case, 
but it would have been a considerable improvement to have computed 
correlation coefficients and their standard errors, and to have supplied 
more than the few and sometimes questionable explanations. There 
follows a generally useful chapter discussing twelve industries which 
experienced unusually large increases or decreases in concentration; 
the quality of the discussion is uneven, and one would hope for a more 
discriminating use of source materials. Thus, it may (or may not) 
be true that “independent barrel makers were at a competitive dis- 
advantage in obtaining steel during periods of shortage” (p. 81). But 
the only support for this conclusion is that an interested party, the 
vice-president of one of these companies, complained “in a letter to 
Smaller Business of America, Inc.” This kind of unverified complaint 
may at best be a lead; it is not evidence. 

By far the best part of the Report is in certain appendixes. Appendix 
B computes indicators of homogeneity, measured by an industry’s 
shipments of its primary products as a per cent of its shipments of all 
products, and as a per cent of all industries’ shipments of the primary 
products. About one-seventh of all industries were eliminated from 
comparison because they fell short of a homogeneity standard of 66.7 
per cent for both indicators. One would appreciate some explanation 
of why the particular cutoff point was chosen, but the procedure is 
certainly correct, and the tables are useful for reference. Appendix 
C covers the reliability of measures, compiled by the Census Bureau, 
of the concentration of employment in individual industries in 1950. 
These are based on the Annual Survey of Manufactures, and are there- 
fore affected by sampling errors, including possible failure to include 
all the largest concerns. The higher the concentration ratio, the 
wider the range corresponding to a given confidence interval. This was 
to be expected from the general nature of the reliance on the largest 
units of a highly skewed distribution as a means of achieving a more 
efficient sample design. The resulting error of bias (in a non-probability 
survey) or lesser degree of precision (in a probability survey) are most 
likely during periods of rapid change. Also of importance for short- 
period changes is a special tabulation by Census showing what changes 
would occur if certain large establishments were classified in industries 
representing their primary products in 1950 rather than in 1947, to 
which the Annual Survey adheres. This “resistance to reclassification” 
is adopted by the Census Bureau because of the anomalies that might 
result when the products of a large establishment in one year were 
(say) 55 per cent in industry A and 45 per cent in industry B. In a 








662 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


succeeding year the percentages might be reversed. Unless some resist- 
ance principle were adopted, the result would be a spurious decline in 
the output of industry A and a spurious increase in the output of B, 
The Report does not give any coherent explanation of this practice 
and its justification ; but it does classify 247 industries by concentration 
ratios, standard error, and allowance for resistance effect. The re- 
viewer would consider these as more than mere detail improvements; 
greater practical use and also improvements in concept are both to be 
found along this line. 

Much interest will attach to the Report’s computation of concentra- 
tion in “manufacturing as a whole.” This consists of the aggregate 
sales-plus-interplant transfers of the 200 largest manufacturing corpora- 
tions as » per cent of sales-plus-interplant transfers of all manufacturing 
establishments. There is no industrial breakdown of the 200. This 
measure of concentration was 37.7 per cent in 1935, and 40.5 per cent 
in 1950. 

The use of sales-plus-interplant transfers as the dimension of size 
and concentration was not necessary for a 1935-1950 comparison. The 
1935 source’ tabulated the largest 200 manufacturers as accounting 
for 31.1 per cent of value added by manufacture, for 24.8 per cent of 
employment, and for 37.7 per cent of value of product (sales). Why did 
the Report use this measure of size? Value of shipments to measure 
concentration in individual industries may be defensible as a practical 
matter or for certain purposes even in theory. But there is no such 
justification for using sales as a measure of size when grouping corpora- 
tions together across industry lines. The result is massive and uncon- 
trolled double counting. It is almost embarrassing to have to point 
out that the size of a business firm is defined by its own employees, 
assets, or contribution to national product—and not somebody else’s. 
To lump together, on the basis of their sales, corporations with widely 
varying ratios of sales to employment, assets, or value-added, is to 
aggregate horses and apples. 

Furthermore, this worst of measures for any given year will make it 
impossible to compare the 1950 estimates with those for future years, 
although the opening sentence of the Report states this as a primary 
aim (p. 1). Value of shipments for all manufacturing has not been 
published since 1939. The Report never states this fact, only referring 
(p. 106) to “the Federal Trade Commission’s estimate of the value 
of shipments of all manufacturing industries”. On the next page the 





3 National Resources Committee, The Structure of the American Economy (Washington: Govern- 
ment Printing Office, 1939), Appendix 9. 


R 1955 


osist- 
ne in 
of B. 
ctice 
ition 
» Te. 
nts; 
O be 


‘tra- 
Zate 
ora- 
ring 
his 


ent 


size 
The 
ing 


@ 
~ 


+ 1O 2 OQ ES 4 


FEDERAL TRADE COMMISSION REPORT 663 


distinct impression is given, no doubt inadvertently, that this non- 
existent statistic can even be assigned a standard error! In any event, 
had the present Census rule existed in 1935, the comparison made in 
this Report would have been impossible. Unless the Bureau changes 
its practice, it will be impossible to compare this estimate with any 
future estimates. 

Moreover, the use of value of shipments involves much unnecessary 
labor. For computing the share of the largest 200, and for no other 
purpose, the Report uses the item of interplant transfers and adds it to 
sales. No less than 35 per cent of the 1950 universe figure (sales-plus- 
interplant transfers of all manufacturing establishments) had to be 
estimated, with the usual approximations and assumptions, some better 
than others. All this adjustment would have been avoided simply by 
having the largest 200 respondents supply value-added (and/or em- 
ployment) instead of interplant transfers. Then published universe 
figures would have been available, the work would have been less, and 
there would have been two valid measures of concentration. 

The defects of value of product are compounded in measuring changes 
in size and concentration over any extended time period. With the 
exception of assets, which are essentially a moving average, centered 
at a point before the date of record, all measures of concentration based 
on one-year periods and current values are distorted somewhat by 
price changes and by the business cycle. As to price changes, value of 
products is particularly bad because it reflects fluctuations not only 
inside but also outside the firm, the industry, and all manufacturing. 
Thus, if the price of raw cotton were to rise more than the price of 
raw materials generally, the value of product of cotton textile mills 
and apparel manufactures would ceteris paribus rise more rapidly than 
the value of product of all manufacturing. Since cotton textiles and 
apparel are a relatively small-business industry, there would be a spuri- 
ous decrease of concentration in “manufacturing as a whole”. We need 
only recall that in 1935-1950 the wholesale price index doubled, with 
many divergences among industrial groups. 

As for cyclical changes: in 1935 manufacturing capacity was roughly 
about half utilized‘—which means that many industries and com- 
panies, mostly in durable goods, were spuriously “small.” The year 
1950, on the other hand, was one of full or over-full employment. Its 
second six months saw a speculative Korean war boom, which tem- 
porarily over-stimulated the real output of some industries. (It also 





‘The Federal Reserve Board index of manufacturing production (1935-39=100) stood at 87 in 
1935, and at 168 in 1941; durable goods production was respectively 83 and 201. 





664 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


caused speculative price movements, all irrelevant but all recorded in 
the 1950 value of shipments.) An examination of 1935-50 corporate 
sales by industry groups shows that the combined effect of price changes 
and cyclical changes was to increase the sales of the big-business in- 
dustries, who account for practically all the sales of the top 200, sub- 
stantially more than manufacturing corporate sales generally. This 
probably more than accounts for any “increase” in the FTC measure. 
(Statistics of Income shows that in 1931 the 139 largest manufactur- 
ing corporations held 46.5 per cent of the assets of all manufacturing 
corporations: in 1949 the largest 142 held only 43.4 per cent.) But 
whether concentration increased or decreased over this period, a 
measure based on sales-plus-interplant shipments deserves no credence; 
the figure has no meaning. 

Obviously, this report was prepared before a majority of the present 
Commission assumed their posts. Of the new arrivals, the Chairman 
has expressed strong support of more and better economic and sta- 
tistical research by the agency. This report contains two excellent 
guideposts. The broad-brush portrayal of increasing concentration 
which drew attention in the press and the Congress is based on a tech- 
nique which was outmoded at least a quarter-century ago, when the 
pioneering estimates of Gardiner C. Means appeared; the results mean 


nothing. In contrast, the undramatic appendixes which get down to the 
actual job of measuring concentration are a genuine contribution and 
give promise of more. 





COLONIAL SOCIAL ACCOUNTING* 


Wiuiiam O. Jones 
Stanford University 


HYLLIS DEANE’s report on the experiment with national income 
FP cccsenling applied to colonial territories, published in 1948 as an 
Occasional Paper [2] of the National Institute of Economic and Social 
Research, concluded with the statement that “a more comprehensive 
and direct knowledge of the social and economic structure of Central 
African peoples was essential if a satisfactory framework was to be 
evolved”. In order to obtain this knowledge the investigation was to be 
transferred to the African colonies themselves, and direct field enquiry 
used to supplement the existing information. Colonial Social Account- 
ing is a final report on that enquiry. It contains revised estimates of the 
national incomes of Northern Rhodesia and Nyasaland (Parts II and 
III) together with an account of village economic surveys and of some 
of the conceptual and operational difficulties involved (Part IV), and 
detailed summary of the sources of the estimates (Appendixes I and 
II). 

The most valuable chapters are those describing the problems that 
must be solved if native African economies are to be forced into this 
mold that fits European and American economies only imperfectly. 
Unfortunately these sections will be read and quoted less widely than 
the estimates themselves. The estimates of per capita income of the 
African population are already being quoted as if they were reliable 
measures of level of living, and will undoubtedly attain wide currency. 
It might have been better to arrange the parts so that the qualifica- 
tions precede, rather than follow the estimates. It is all too easy now 
to read the first 100 pages, containing all of the income tables with 
little suggestion of their fundamental untrustworthiness, and stop. 
But perhaps it was reasoned that people rarely pay any attention to 
the qualifications placed on statistical tables. 

The book itself speaks with two voices: the straight face with which 
the estimates are presented is disturbingly inconsistent with the be- 
wilderment expressed in later chapters over the problem of evaluating 
native activities in units commensurable with those used for the Euro- 
pean part of the economy. Deane has provided a handsome target for 
the reader critical of attempts to assign money values to goods and 

* A review article on Deane, Phyllis, Colonial Social Accounting. Cambridge, England: The Na- 


tional Institute of Economic and Social Research, Economic and Social Studies XI, 1953. Pp. xv, 360. 
$10.00. 





665 





666 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


services that are rarely exchanged for money and to the multifarious 
activities having no value at all in American or European eyes; she 
has also provided some of the rocks to throw, and has even tossed a 
few herself. Indeed, it is not altogether clear whether she feels the 
experiment in “colonial social accounting” to have been a success or a 
failure. The concluding paragraph of the book begins with the following 
disheartening statement (p. 228): 

“These, however, are the basic truths about colonial social accounting, 
and it is difficult to see how they can honestly be thrust into the back- 
ground. The statistical material is inadequate for purposes of precise 
and intelligent analysis, and the concepts which are applicable to a 
money-exchange economy mean relatively little in the context of a 
subsistence economy... . ” 

Were it not for the word “precise” in the foregoing, this would seem 
to be as full an admission of failure as the most severe critic could ask. 
It is in sharp contrast with the cold tables of Parts II and III of the 
book in which the author records and combines estimates of incomes 
of Africans and Europeans without apparent embarrassment. 

The experiment was of course not a failure. It produced estimates of 
various components of the nationa! incomes of the two territories, some 
of fair accuracy. And even the total figures are probably good guesses 
as to the magnitudes they attempt to consider. These are certainly 
estimates of national income in the restricted sense in which this term 
is customarily used. But it is seriously open to question whether the 
figures for the native sector of the economy may be taken either as 
indicators of level of living or of economic output. They are estimates, 
valued in European measures, of those parts of native production that 
have counterparts in the European sector. 

Even the making of such estimates as these was a formidable task; 
the author performs the additional service of pointing out their many 
weaknesses and the wide ranges in which the true values seem to lie. 

The problem Deane set herself was to assign a monetary value to the 
product or service of every “economic” activity. Prices to be attached 
to each activity were derived from its contact, however slight, with the 
exchange economy. If 5/10ths of output was consumed by producers, 
4/10ths bartered, and 1/10th sold, the part sold determined price. This 
might be a very narrow market; it might also be an abnormal market, 
resorted to only in times of great need or affluence, or to obtain money 
for trivial or unusual purchases.! 





1 There is in fact no satisfactory way of obtaining the retail prices prevailing in native markets. 
Each gale is usually at a price reached only after bargaining, and the price varies from market to market 
and even from buyer to buyer. Prices quoted to Europeans may be higher or lower than those at which 
sales are made to Africans. 





a etc oh On Ot wt UC SlCO OO! Oe 


i ea -_ hh. ee 


COLONIAL SOCIAL ACCOUNTING 667 


A simpler solution might have been adopted, that of ignoring all 
transactions except those in which money is involved. The Central 
African Statistical Office estimates Net National Income for Northern 
Rhodesia generated in the money economy only, explaining that “it 
was felt advisable to omit any statement of the value of subsistence 
output as it could only be a notional figure that could not be checked or 
corrected in any way... ” and would probably not be additive to the 
estimate for the money economy [1]. 

But ignoring the “subsistence economy” in Northern Rhodesia and 
Nyasaland is a pretty serious business for one who, like Deane, believes 
that “the need is for a design of accounts which shows up the distribu- 
tion of the national income among the different groups, the nature of 
each one’s contribution to it, and the characteristic composition of 
each group’s standard [level] of living” (p. 5). In Central Africa, most 
of the native population satisfies its economic wants in the subsistence 
sector, and failure to include that part of national output leads to a 
very peculiar measure of individual income. 

When account is taken of the statistical problems involved in ob- 
taining the kind of measure Deane sought, and the economic problems 
involved in interpreting the measure once it was obtained, the solution 
adopted by the Central African Statistical Office appears to have merit. 
National income generated in the money economy is an understandable 
magnitude. It does not pretend to tell anything about economic wel- 
fare of the entire society; it measures change in comparatively simple 
terms; and it summarizes information about that part of the economy 
in which the consequences of governmental intervention are most 
nearly predictable. 

Administrative budgets in the colonies have tended to be tight, with 
little money available to support even the most elementary statistical 
services. There has never been a census of population, of agriculture, 
or of business in the colonies considered by Deane. Estimates of popu- 
lation and agriculture are usually built up from reports of District 
Commissioners on tour. The population estimates made from “tour 
counts” are very rough indeed and during the years to which the study 
applies there had been very little touring.? 

Estimates of agricultural production are much worse. To find a value 
for native agricultural output to put in the Nyasaland table (it is the 
largest individual value in that table), the following assumptions were 
involved (pp. 317-18): 





2 A sample census of population in Northern Rhodesia was taken in 1950, too late for use in the 
study. But Deane believes a sample census little more accurate than District Commissioners’ counts 
(p. 231). Population estimates from the sample census vary little from those used by Deane. 





668 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


1. Assume “that the average cultivated acreage per family in South- 
ern Province was 2 acres and in Northern and Central Provinces 3 
mh..." 

2. Assume “that the number of families was equivalent to the 
number of married or widowed women as shown in the 1945 Census, 
...” Of this census the Colonial Office says [3] “it claimed to be no 
more than a useful, and in the aggregate a fairly accurate estimate of 
the African population based on a count”. 

3. Assume “that maize or its value equivalent was grown on about 
75 per cent, groundnuts, beans or their value equivalent on 20 per cent, 
and cassava or its value equivalent on 5 per cent”, 

4, Assume yields per acre for each of these three groups of crops. 

5. Assume prices for each of these three groups of crops. 

6. Assume “that each individual consumed in a year an average of 
10 lb. of fruit, green leaves, etc., not grown on the cultivated acreage, 

7. Assume that the fruit and green leaves can be valued at 2 pounds 
for a penny. 


By procedures such as this Deane arrived at her estimate that African 
income from independent agriculture amounted to £3,561,000 in 1945 
(p. 98). 

Even when it comes to estimates for the European part of the popu- 
lation, the author confesses some uneasiness. In Northern Rhodesia the 
number of Europeans earning incomes was only 8,225 (the number in 
Nyasaland was 1,200), all incomes over £700 were subject to tax and 
the 1946 Census of non-Africans contained detailed questions about 
incomes, and three industries, mines, railways, and government, em- 
ployed five-eighths of all Europeans with income, yet Deane states 
that the margin of error in the estimate of total money incomes of 
European individuals was about 6 per cent (pp. 237-243). The error in 
the estimate of total money earnings of Asians and Colored (mulattos) 
must have been larger. 

But Europeans, Asians, and Colored make up a very small part of 
the total population (about 1 per cent in Northern Rhodesia, about 
one-quarter of 1 per cent in Nyasaland), and anyone who has read 
Deane’s account of how estimates of incomes for this part of the popu- 
lation were made, of the cross-checking, comparing, and calls-back, 
must be convinced that the errors here are not so large as to be damag- 
ing. It is in the estimates for the other 99 per cent of the population, 
the Africans, who have never been counted, but who are the population 





—- — 42 es Ot 


— se, — SS FS ee o> 


Ee  ———S— ——<&«<_- <<.° °° —_ 


COLONIAL SOCIAL ACCOUNTING 669 


of the country, in any normal meaning of the word, that the trouble 
lies. 

We can divide the income, or the output, or the expenditures of the 
African population into two main parts: that which is in and a part of 
the European-Asian economic order, and that which is completely 
within the old native order. Probably no individual will be completely 
part of either order, nor will many activities. The native houseboy 
who is paid in money, food, and lodging, but whose food is African 
food and whose lodging may be African lodging, both quite different 
from those favored by his employer, and who still preserves ties with 
the old order through his village and through a town social system that 
has close links with the past, clearly produces and receives income in 
both worlds. On the other hand, the farming family that produces 
almost all of its own food, and makes its own*house and furniture, 
will receive a little money income from the sale of farm products or 
from a family member employed by Europeans, and will buy a little— 
pots and pans, cloth, salt, soap for example—in the markets. 

It is probably the output and income of the “non-money” part‘of 
the economy that is most important to the native population, and it is 
in the valuation of this income that the investigator runs into serious 
trouble. The trouble takes two forms; one can be solved conceptually, 
the other so far has not been. The first problem is that of counting or 
measuring the various products and services in units peculiar to each— 
in number of pounds or gallons or yards, in number of chairs or stools 
or tables or houses, in hours or minutes during which the service is 
enjoyed, or even in the decreases in numbers of stillborn, in deaths at 
childbirth or from particular diseases. The second problem is one of 
the oldest known to economists, how to convert all these disparate 
counts and measures into one, and that one somehow to be a measure 
of value, of total economic output, or of welfare. 

Deane has labored courageously at the first task. She has probably 
done as well as could be done. (I can not help wishing she had shown 
us a little more of the way in which she arrived at the output of native 
agriculture.) In her attempt to measure the magnitude of native output 
she proceeded with great care and understanding. The chapters in 
Part IV that describe the job of trying to get aggregate measures for a 
fragmented, largely illiterate society, self-contained to a remarkable 
degree, and with little money should be read by all who contemplate 
such an undertaking or who wish to use her estimates. These chapters 
also contain a great deal of useful and rare information on money 
income, property, purchases, and gifts in the communities studied. 





670 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


The second problem, however, she really does not try to solve at 
all. Instead she uses a device similar to that employed by the Central 
African Statistical Office, only going a little further. The total tonnage 
of corn produced is multiplied by the price for which some corn was 
sold in some native markets; the housing supplied to miners as a part 
of wages is translated into money at the estimated average rent paid 
for rooms and dwellings actually rented for cash (p. 247); plates, cups, 
and knives were valued at “the average price for all the recorded trans- 
actions in that commodity for the area concerned” (p. 193). By such 
means as these it is possible to translate a wide variety of measures into 
money. What the meaning of the translated values is, if they have any 
at all, is another matter. 

Deane points out clearly one of the difficulties in such translations 
when she describes her handling in the village surveys of “Transactions 
Without Cash” (p. 195). 

“It seems, for example, that barter transactions in goods and services 
for immediate consumption are both common and casual. A man may 
barter a chick for some beer because he is temporarily short of cash 
and a beer party is in progress. A woman may barter a few groundnuts 
for a cupful of salt because she has not time to go to the store for it. 
A man may prescribe for another’s cough and receive a pumpkin in 


exchange. All these are small and ephemeral transactions, often en- 


tirely unconnected with current exchange values, .. . ” 


In Chapter XIV there is recognition of the importance of family 
production together with a valuable summary statement of the problem 
of estimating it, and of the artificiality of the technique used to include 
it in the estimates. 

How important native agriculture is to the total economy can only 
be suggested by the estimates; for 1945 it contributes 40 per cent of 
national income for Nyasaland (p. 98), and 23 per cent for Northern 
Rhodesia (p. 64), where the mining industry accounts for a slightly 
larger share. It is difficult to appraise the possible error in the estimate 
of native agricultural output, even accepting the author’s device for 
translating into money. That the true values may be twice as great 
as those she uses is not at all unlikely. 

Estimation of the national income of dual economies like those con- 
sidered here involves essentially the same sort of problem as is en- 
countered in comparison of national incomes of two separate economies. 
The difference is purely arithmetic; in one case interest attaches to a 
sum, in the other to a difference. 

But the nature of the difficulty is somewhat obscured when one and 





COLONIAL SOCIAL ACCOUNTING 671 


the same currency is used in both parts of the economy, as it is in the 
native and European sectors of the colonies. If different currencies 
were used, attention would at once be directed to finding an appropriate 
relationship between them, and it could be hoped that the pitfalls in 
such comparison would become apparent. 

Economists have achieved some success in comparing incomes of 
economies that are similar in their organization and in their values, but 
even then the best we can hope for is that a high percentage of the items 
in the two economies correspond and are similarly valued. Complete 
correspondence in the nature and the valuation of all economic goods 
and services is too much to expect, and we must always resign ourselves 
to a degree of incommensurability. 

When two economies are as different as those of the native and Euro- 
pean parts of colonial territories, the extent of correspondence of items 
and values is apt to be very small. If, in addition, the money values of 
goods and services exchanged between the two economies are deter- 
mined almost entirely by one of them, comparison of outputs in mone- 
tary terms will be misleading in the extreme. 

I think it is not open to argument that the standards of values, or 
the standards of living in the strict sense, of the traditional native 
societies of Africa differ about as widely as is possible from the standard 
of the intrusive European societies. As a result, goods and services 
produced by the native economy are apt to be improperly valued, or 
even completely overlooked, by the investigator who uses European 
standards to measure them. 

The native standard of living may contain things that are not in the 
European standard at all, or if present are lost to the view of the statis- 
ticians. Included here are the apparently pervasive desire for identifica- 
tion with and absorption in the village, tribe, or extended family. This 
may be valued for the security it provides in terms of food and shelter, 
or for the deeper security of belonging, of being a part of a greater whole 
in some emotional or psychological sense. Perhaps leisure should also 
be included here, although leisure activities are valued in the European 
world. I should be very much inclined to include as one part of the 
standard a way of life that is not bound by hours of the clock, that 
permits flexibility in the nature of activities and in the time devoted to 
them. 

Other elements in the native standard may have their counterparts 
in the European world, but carry much lower values. Native religious 
or magical ceremonies, rites accompanying birth, puberty, marriage, 
and death, the satisfaction of participating in cooperative undertake 





672 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ings, the pleasures derived from taking part in the activities of the 
market all have their parallels in other societies. They do not, however, 
appear in national income accounts except as they involve expendi- 
tures, such as payment to build a church, initiation fees for a lodge, the 
purchase of confirmation gowns, or the cost of attending a national 
convention. 

There are also many components of the native standard not assigned 
a value by Europeans even though valued European counterparts 
exist. Included here are dances, beer parties, and other entertainments, 
cooperative services and labor exchange, works of art, and sport, not 
only for itself alone, but sport inextricably interwoven with provision 
of material needs, such as fishing and hunting, and even some farming 
activities, as for example the cutting.of high tree branches in new clear- 
ings. 

Beyond the problems introduced by the considerable lack of cor- 
respondence between elements of the two economies, additional com- 
plications may arise because of the peculiar role of money in the native 
economy. Deane’s comments already quoted on relative values of 
bartered goods may also apply to merchandise offered for sale. Money 
transactions may be completely trivial to the native living in the vil- 
lage, either because the need for money is small or because the share of 
total output sold is small. On the other hand, the price may be trivial 
because the desire either for money or for goods is urgent but at the 
same time sporadic. 

Moreover, if the use of money prices is unfamiliar, and if rates of 
exchange of goods and services tend to he determined by tradition and 
hence inflexible, equivalences that might be inferred from trade are 
again quite misleading. 

It is a common observation that the native’s desire for money may 
be satisfied fairly quickly, and there is frequent complaint of the way 
in which hired laborers stop work when they have obtained some 
desired, usually small sum. This undoubtedly reflects the availability 
of goods in the market: a limited variety of useful merchandise that is 
within the native’s horizon of possible acquisitions, other highly prized 
articles completely outside his potential income. It may well be that 
so long as the native income remains low, the demand for money will 
likewise be low, but that after some higher income level is reached, 
demand for money will shoot up sharply. 

Also associated with the general shortage of money may be an over- 
valuation of currency. If a very small supply of coins and bills is avail- 





— ree DS OS GS 


COLONIAL SOCIAL ACCOUNTING 673 


able to handle an increasing volume of trade within the native economy, 
there may be a continuing deflationary effect not corrected by an in- 
flow through the narrow channel of trade connecting with the European 
economy. 

These conjectures about the place of money in the native economy 
require further examination and comparison with the facts. Taken with 
what is known about the great differences between native and European 
living standards, however, they go far to explain why investigations 
conducted along the lines followed by Deane give what appear to be 
nonsensical answers. 

Incomes of £6 per head per year arrived at by Deane are in no way 
an appraisal of economic output, as valued by those who produce it, 
or of level of living, as valued by those who experience it. Valuation 
of housing at 6s. per employed miner per month is a similarly meaning- 
less figure. It is not implied that these numbers are without meaning 
to those who know native standards and ways of life, but they simply 
cannot stand alone, sufficient in themselves, as measures to be added 
to the estimated value of European, Asian, and Colored incomes. 

Much of the foregoing may be interpreted as an attack on a straw 
man. Certainly the reader is warned in the last half of the book of 
many of the dangers inherent in the figures. But calculation of the 
magnitude called national income in itself implies meaning that can- 
not be supported by the facts. 

If we expect the native society eventually to be remade in the image 
of the European, then, when that time comes, national income esti- 
mates of the kind made here will be just as valid as they are in Europe 
or the United States. In the meantime they may even serve as some 
sort of index of the extent to which the native economy has been Euro- 
peanized, although a better measure would seem to be that of national 
income originating in the money economy. 

Colonial Social Accounting reports an honest attempt to apply to 
backward economies a technique that has had considerable success 
when applied to more clearly market-oriented economies. Defects of 
measurement that are troublesome but probably not serious in America 
and Europe, such as those arising from appraisal of unpaid household 
labor, on the one hand, and from the differing values placed on leisure, 
fine cooking, spaciousness, or independence, on the other, here have 
proven sufficiently grave to jeopardize the entire enterprise. 

For appraising level of living, for measuring changes in output and 
productivity, the estimates are unsatisfactory. Of course we can say 





674 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


that output is only economic if Europeans think it so, and to the extent 
that they think it so, and let it goat that, but I find this a rather uninter- 
esting solution. It would be much better to recognize the incommensur- 
ability of the components and not aggregate them at all. Much more 
can be learned about output, productivity, and level of living by studies 
of agricultural yields of specific crops in pounds per acre and per man, 
by collection of data adequate to permit calculation of food balance 
sheets, by careful diet studies, by statistics of mortality, and by other 
similar investigations yielding results that have clear meaning in them- 
selves. It would seem wiser for colonial statisticians to spend the limited 
funds available to them on studies such as the sample surveys of agri- 
culture conducted in Nigeria in 1950-51 [4], and in Tanganyika in 
1950 [7], on improvement of trade statistics [6], or even on detailed 
diet studies like those made in Nigeria by B. M. Nicol [5]. Better trade 
statistics could greatly improve our knowledge of the money economy; 
improving agricultural statistics in some areas and remedying the total 
absence of such data in other areas would vastly improve our knowledge 
of native economic production; and even small samples of native diets, 
if taken carefully and reported in detail, would move us a long way 
forward in our knowledge of native welfare. 

The priority job in colonial statistics should be to get better esti- 
mates of the important magnitudes. Even when this has been done, 
however, it may still be necessary to keep them separated, to resist the 
temptation to aggregate them. Nor should we regard estimates of farm 
output and of food consumption as other than the crudest measures of 
welfare. More complete knowledge can come only as a result of exten- 
sive comparisons between native standards and levels of living. On the 
other hand the elementary measures can be used in comparing similar 
native economies and in measuring change over time. Similarly, im- 
proved statistics for the money economy will provide more precise 
bases for forecasting the consequences of intervention in that area. 

If I criticize Deane’s analysis, it is not because she has failed to 
recognize and discuss the problem of evaluating native product, but 
because she has then gone ahead in her calculations as if these problems 
did not exist. Chapter IX, “Social Accounting for Primitive Communi- 
ties”, contains enough arguments against the use of national income 
accounting in primitive communities to stop all but the most enthusias- 
tic devotees of the method. Yet however great the obstacles may be 
Deane forges resolutely ahead. “It is almost always possible”, she says, 
“to cite an instance where a money price has been charged within 





COLONIAL SOCIAL ACCOUNTING 675 


recent memory and in the same general area. On the other hand, it is 
probably safe to say that for the majority of economic activities there is 
no market value as it is understood in the money economies. When 
goods are not normally traded the prices of the goods that are traded 
do not reflect the value in resources used or in relative desirability of 
subsistence output” (p. 123). 

She then cites as an example the sale of food during a period of short- 
age before harvest, properly noting this as an extreme case, but: “For 
all the most important commodities there is a price which has some kind 
of meaning, even if the meaning is irrelevant to the internal economy 
of the village” (p. 123). 

When it comes to the use of money within the native economy, 
Deane says, “although money is used freely and money value is a 
familiar concept in the semi-subsistence economy, the prices are not 
necessarily part of an integrated system.... There is no common 
standard of value which would make it possible to consider each eco- 
nomic activity as a function of its drain on the community’s total scarce 
resources or of its contribution to the community’s total needs” (p. 
124). 

There is a great deal more in this chapter to the same effect. It pro- 
vides effective argument against making and publishing the kind of 
tables that are the heart of Parts II and III. And yet the chapter con- 
cludes with the statement, “The best that can be said about a social 
accounting system when applied to a primitive economy, . . . is that it 
takes all sectors and aspects of the national economy into account and 
that it provides a very rough and often one-sided indication of their 
relative contributions to total national economic activity” (p. 130). 

There must be considerable doubt as to whether such accounts 
provide even a rough indication; there can be no doubt about their one- 
sidedness. More consistent bias, as in the Central Statistical Office’s 
estimates, would be preferable. 


REFERENCES 


[1] Central African Office of Statistics, The National Income and Social Accounts 
of Northern Rhodesia, 1945-1949, Salisbury, Southern Rhodesia, May 
1950, 1. 

[2} Deane, Phyllis, The Measurement of Colonial National Incomes: An Experi- 
ment, Cambridge, England: National Institute of Economic and Social Re- 
search, Occasional Paper XII, 1948. 

[3] Great Britain Colonial Office, An Economic Survey of the Colonial Territories: 
1951; Vol. I, Col. No. 281 (I), 1952. 36. 





676 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


[4] Nigeria, Department of Statistics, Report on the Sample Census of Agricul- 
ture, 1950-51, Lagos, 1953. 

[5] Nicol, B. M., “The nutrition of Nigerian peasants with special reference to 
the effects of deficiencies of the vitamin B complex, vitamin A and animal 
protein,” British Journal of Nutrition, 6 (1952), 34-55. 

[6] Seers, Dudley, “The role of national income estimates in the statistical 
policy of an underdeveloped area,” Review of Economic Studies, XX (1952- 
53), 159-68. 

[7] Tanganyika, Report on the Sampie Census of African Agriculture: 1950 (re- 
vised), prepared by the East African Statistical Department, November 
1953. 





—-— oo, (SS > FCO OO OO 


CONCEPTS EMPLOYED IN LABOR FORCE MEASURE- 
MENTS AND USES OF LABOR FORCE DATA 


A. Ross Eckier, GERTRUDE BANCROFT, AND ROBERT PEARL 
Bureau of the Census 


Controversy over labor force concepts arises to a large ex- 
tent because of the varied and often-conflicting uses of the 
data. The Census Bureau has tried to meet this problem by 
providing detailed information on major categories and by 
experimental work to measure borderline groups. Types of 
questions that have been examined are: difference between 
labor force participation during a year and a calendar week; 
underemployment of employed persons; strength of job at- 
tachments of persons not at work for various reasons; the ap- 
propriate classification of persons on the fringe of the labor 
force. Efforts have been made also to reconcile statistics from 
households and from business and administrative records. 


T 1s a well-accepted principle that concepts should be developed in 
| the light of the uses to be made of the data being collected. However, 
the uses of labor force data are manifold, and often conflicting. One of 
the important uses is to provide a comprehensive measure of changes in 
economic activity. There is general recognition that not only are the 
employment and unemployment totals important, but also that the 
changes in hours of work need to be watched, as well as changes in the 
number of those who are working on a part-time basis because of ac- 
tions taken by plants to avoid lay-offs. Changes in duration of un- 
employment are also used as an indication of the significance of a 
particular business readjustment. 

Another important category of use, but one which varies markedly 
in public interest, is in connection with general manpower analysis. At 
the time of large-scale defense effort or of actual mobilization, data on 
the labor-force status and other characteristics of the population are 
valuable in providing the basis for estimating manpower potentials 
under various assumptions. 

A third important use is to give current information on some of the 
longer-run social and economic population changes which are more 
fully shown in the results of the decennial censuses. The current labor 
force series reveals, for example, changes in the age of entrance into 
the labor force as trends toward the prolongation of schooling continue. 
At the other end of the working life, they show the extent to which 
retirement is affected by social security and other provisions. Data on 
family composition, including the number of young children, show the 


677 





678 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


relationship between family duties and outside employment, and thus 
indicate the extent to which the proportion of women in the labor force 
continues its long-run rise. 

Other uses may be mentioned even more briefly. For example, the 
labor force series is the only comprehensive source of information on 
groups of workers to whom special attention is directed from time to 
time such as veterans, older workers, non-whites, domestic workers, 
the disabled, and others. The data are of interest to some analysts as 
an approach to measurement of labor input, while others would like to 
relate them to the production of consumer income. 

Not only do the uses exhibit wide variety, but there is also sub- 
stantial difference of opinion as to the values to be attached to various 
features of the data. At one and the same time, we are being urged to 
increase the accuracy of the figures, to expand the detail published, to 
speed up collection, to improve methodology but maintain consistency, 
to develop alternative measures, and to reconcile these data with other 
statistical series. Those interested in the measurement of economic 
developments are likely to stress speed, accuracy, and sufficient con- 
tinuity to make month-to-month and year-to-year comparisons 
significant. Others concerned with broad social and economic analysis 
do not require great accuracy and speed, but want a wide variety of 
information and many detailed cross classifications. Those studying a 
particular group would, of course, ask for the maximum amount of sub- 
division for that group. Still others, interested in neither level of ac- 
curacy nor in consistency of classification, are ready to accept whatever 
is published and are impatient with technical sections describing 
sampling and non-sampling errors. 


STATEMENT OF PRESENT CONCEPTS 


The present labor-force concepts have been used for less than 20 
years. The 1940 Census gave up the “gainful worker’ concept based 
upon the reporting of a gainful occupation and adopted in its place the 
concept of “activity or behavior” in a particular week. During the 
period since 1940, the basic concepts have not been changed, but as 
will be described later, a good deal has been done in clarifying the 
concepts and in recognizing and attempting to remedy some of the 
deficiencies. 

The present concepts have been described at so many places that 4 
brief review should suffice at this time. The classification of a person 
by employment status is based primarily on his reported activity dur- 
ing a specific calendar week. If the person worked at all for pay or 


—_— .5 > ee sem OOOO lUFlUlCO OE 





LABOR FORCE MEASUREMENTS AND DATA 679 


profit during the week—or without pay for 15 hours or more in a 
family-operated enterprise—he is classified as working, and hence as 
employed, regardless of other activities. If he was not working but was 
looking for work, he is classified as unemployed. In order to obtain 
realistic current counts of the employed and unemployed, certain so- 
called “inactive” groups are added to each primary category. Included 
with the employed are persons who were neither working nor looking 
for work, but who had definite jobs or businesses from which they were 
temporarily absent the entire week for such reasons as vacation, illness, 
industrial dispute, and temporary (less than 30 day) layoff. Included 
with the unemployed are those who would ordinarily have been looking 
for work but were not because of temporary illness, belief that no work 
was available in their line of work or in the community, or because they 
were awaiting recall to jobs from which they were on indefinite layoff. 
The civilian labor force includes the employed and the unemployed; 
the total labor force also includes the armed forces. Civilians neither 
employed nor unemployed are classified as not in the labor force. 
The basic classifications for the monthly series are applied only to the 
population i4 years old or over. 


EXPERIMENTAL WORK FOR PROBLEM GROUPS 


In the past 12 years much experimental work has been carried out 
under the guidance of interagency committees and subcommittees to 
explore a variety of problems in connection with the application of the 
labor force concepts described., A brief account of them may be helpful 
as a basis for consideration of further steps that could be taken. Some 
of the research has pertained to what might be considered the “ground 
rules” covering such matters as age, time periods, etc., some with the 
collection of more detailed information on certain groups, and some 
with the collection of more information on the appreciable numbers of 
persons in certain marginal categories. 


QUESTIONS RELATING TO AGE LIMIT AND PERIODS OF REFERENCE 


Age. The exclusion of children under 14 years of age from the labor 
force statistics has been a matter of relatively little concern, except 
possibly to groups particularly interested in problems of child labor. 
In fact, there has been some pressure to raise the age limit to 16 or 17 
and to provide a cutoff at the other end of the scale, possibly at age 75, 
in order to eliminate some of the more marginal groups from the 
statistics. 

Nevertheless, substantial numbers of young children under 14 still 





680 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


do farm work; and persons accustomed to finding their daily newspaper 
at the door might pay tribute to the economic contribution of non- 
farm youngsters as well. On a few occasions, usually with outside 
sponsorship, the Bureau has collected labor-force statistics for children 
under 14 years of age.! In addition to their utility to groups inter- 
ested in child-labor problems and possibly to labor input analysis, 
these data have been of some importance in the reconciliation of the 
Census Bureau’s statistics with those based on employers’ or estab- 
lishment reports, which generally specify no age cutoff. 

Use of Week as Time Reference. The use of a calendar week as the 
time reference for labor-force surveys has had generally widespread 
support. Measurements based on a week are not only a reflection of 
current status but are also.less subject to the memory biases which 
arise in the use of a longer time period. When conditions are atypical 
during a specific week, however, and especially when that week repre- 
sents the only measurement obtained in a given month, there are cer- 
tain obvious disadvantages. Moreover, for some purposes, alternative 
types of measurements are desirable in order to obtain an adequate 
description of the complex labor market behavior of a large segment of 
the population. 

The occurrence of a legal holiday during the survey week, of course, 
has a substantial impact on hours worked and will obscure month-to 
month movements in the workweek. Some experimental studies have 
been made to measure the effect of holidays on hours distributions 
and manhours worked and it is believed that rough adjustment factors 
could be developed through further research of this type. 

Some of the limitations of the use of the calendar week approach are 
also met by a regular annual inquiry on work experience during the 
preceding calendar year. Persons are classified according to the number 
of weeks in which they worked during the year, whether they were 
working fulltime or part time while employed, and according to their 
principal occupational and industrial attachment. Although informa- 
tion of this type is somewhat out of date by the time it is compiled 
and contains deficiencies because of memory factors, it does in some 
ways provide a more comprehensive description of the adequacy of 
employment for the period covered than does an accumulation of the 
monthly statistics. Moreover, for specific individuals, it may provide 
a@ more meaningful and typical classification.? 





1 For some of the results, see U. S. Department of Labor, Bureau of Labor Statistics, Caution: 
Children Under 14 at Work, January 24, 1951. 

2 For a summary of these surveys, see U. 8. Bureau of the Census, Current Population Reporis, 
Series P-50, Nos. 8, 15, 24, 35, 43, 48, and 54; the first three of these studies were based on a somewhat 
more limited approach than that described above. 





LABOR FORCE MEASUREMENTS AND DATA 681 


A good illustration of the differences revealed by the two approaches 
| may be found in an examination of the statistics for white and non- 
white women. Based on the calendar week approach, the proportion 
of nonwhite women who are in the labor force has historically exceeded 
that of white women by a wide margin, suggesting more intensive 
economic activity on the part of the former. The annual surveys indi- 
cate, however, that the participation of nonwhite women is on a much 
more intermittent basis and that the annual per capita labor input of 
the two groups would show much less variation. In 1953, for example, 
only one-quarter of the nonwhite women with work experience had 
full-time work the year round; the comparable proportion for white 
women was 40 per cent. In any evaluation of these relationships, it 
should be borne in mind that for persons with intermittent employment 
the reports upon work during a preceding year are subject to consider- 
able ranges of error. 

The usefulness of a labor-force classification based upon a particular 
week has been much more strongly challenged in connection with the 
Decennial Census, where the results are available too late to be of cur- 
rent interest and where a classification based upon a longer time period 
may be in order. In certain undeveloped countries where a large pro- 
portion of the population is engaged in its own account work or in family 
employment, and where the economic fluctuations of more industrial- 
ized countries are lacking, the weekly time reference is subject to 
serious question. 

Hour Limits for Determining Employment. The fact that a person 
working as little as one hour in the week is classified as employed is 
frequently criticized. Many critics seem to feel that a person doing a 
trivial amount of work in a week ought not to be included in the na- 
tion’s labor force. This type of difficulty, of course, is encountered at 
the cutting edge of many classifications, and the provision of tables 
classifying the employed by hours of work makes it possible to deter- 
mine the effect of alternative definitions. 

A more serious point is that many of the persons who are working 
short hours are doing so involuntarily and want more work. This 
problem is discussed later in a consideration of the category inter- 
mediate between employment and unemployment. 

Fifteen Hour Limit for Unpaid Family Workers. The present lower 
limit of 15 hours work in the week for inclusion of unpaid family 
workers among the employed was introduced after considerable experi- 
ence with qualitative rules to eliminate from the labor force persons 
with only incidential duties on farms and in family businesses. It is 
believed to have worked reasonably well in giving greater stability in 





682 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


this part of the labor force. Of course, the difficulty of securing precise 
reporting of hours worked, especially for intermittent workers, in- 
creases the measurement problems associated with any cutoff of this 
type. 

Part-Time Employment Often a Combination of Working and Looking 
for Work. The need for further analysis of part-time workers has led 
to a good deal of experimentation since 1946 to determine the reasons 
for part-time employment. Persons working less than 35 hours during 
the survey week were separated into those who usually worked full 
time at their current jobs and those who usually worked part time. 
For those normally on full time, the reasons for their reduced hours 
during the survey week were ascertained. The others were asked 
whether they preferred and could have accepted full-time employment. 

These special investigations—which have been conducted quarterly 
during periods of declining economic activity and less frequently at 
other times—have been among the most successful of the innovations 
in the survey.* The principal additional categories established there- 
by have been the number of full-time workers on reduced workweeks 
because of slack work, material shortages, job turnover, and other 
economic factors and the number of regular part-time workers who 
preferred and could have accepted full-time employment. The first of 
these groups—full-time workers on shortened workweeks—has proved 
to be a highly sensitive indicator of economic trends, often signalling 
important changes before they were reflected in either the over-all em- 
ployment or unemployment estimates. The second group—part-time 
workers desiring full-time work—is somewhat less sensitive to these 
trends, because it includes a hard core of persons in perpetually un- 
stable occupations such as domestic service, but nevertheless shows 
significant increases during periods of declining activity. The special 
studies have also revealed that, regardless of business conditions, the 
large majority of persons working part time are doing so by choice or for 
various personal reasons; this is true especially of those reporting only 
a very few hours of work during the survey week. 

The diagnostic value of data on part-time employment is somewhat 
limited because the figures have not been available on a monthly basis 
and because interpolation between the periodic estimates has often 
been difficult. Serious consideration, however, is being given to obtain- 
ing the information monthly. It is noteworthy that the Dominion of 
Canada has adopted a somewhat modified version of these special 





2 For results of these surveys, see U. S. Bureau of the Census, Current Population Reports, Series 
P-50, Nos. 7, 12, 17, 18, 21, 25, 26, 28, 33, 34, 46, 52, 53, and 55. 





<— -" &. w@ > 


=. 
co 


a tf mn 


LABOR FORCE MEASUREMENTS AND DATA 683 


questions for its monthly enumeration and that the additional classifi- 
cations have become an integral part of the conceptual framework.‘ 


DIVISION BETWEEN “WITH A JOB” STATUS AND “UNEMPLOYED” STATUS 


Controversy over the classification of certain persons as “with a job” 
has arisen primarily because of the thought that some persons in this 
classification might more properly be regarded as unemployed. There is: 
comparatively little disagreement about the present classification for 
those absent because of vacations, illness, or other personal reasons, 
assuming that they can be separately identified so as not to obscure 
labor-input analysis. Substantial, although not universal, agreement 
exists also in the case of those idle because of bad weather or labor 
disputes. 

Most of the discussion has centered around two of the smaller com- 
ponents—persons on temporary (less than 30 day) layoff and those 
waiting to report to new jobs scheduled to start within 30 days. The 
rationale for inclusion of these persons with the employed, and the 
counterarguments for classification as unemployed or even (in the case 
of those awaiting new jobs) as not in the labor force have been de- 
scribed in great detail elsewhere’ and will not be repeated here. In recog- 
nition of this long-standing dispute, however, the Bureau in 1945- 
started publishing and displaying in a reasonably prominent position in 
its releases, separate statistics for these categories, so that they could 
be treated in accordance with the needs and viewpoints of different 
consumers. More recently, the amount of detail tabulated for these 
persons has been expanded considerably, which should expedite re- 
grouping if a change in classification is indicated. 

The existence of a substantial miscellaneous group of persons absent 
from work in any given week has also been a source of some uneasiness. 
This “catch-all” group includes persons not working for a variety of 
personal or other reasons, no single one of which is sufficiently im- 
portant to warrant separate classification. The total has ranged be- 
tween 300,000 and 500,000 in the past several years and is currently 
somewhat above the level of a year ago. Present plans are to re- 
examine the various components of this group and to determine whe- 
ther any meaningful sub-classifications could be established; if so, they 
would be shown separately in the statistics. 

Although attention has been directed primarily at the categories: 





* Dominion Bureau of Statistics, The Labour Force. 
5 See, in particular, Social Science Research Council, Labor Force Definition and Measurement, by 
Louis J. Ducoff and Margaret Jarman Hagood, Bulletin No. 56, 1947. 





684 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


just mentioned, there is some sentiment for a general overhauling of the 
entire concept of persons “with jobs but not at work.” The concept im- 
plies a very specific job attachment from which the person was only 
“temporarily” absent. Aside from the “layoff” and “new” job seg. 
ments, however, where a 30-day cutoff is specified, the term “temporary 
absence” has never been defined. 

In March 1954, an experimental study was conducted to determine 
the expected duration of absence for persons in the “with a job” group 
in the interest of evaluating the concept and possibly tightening the 
criteria. The results showed, for those reporting, that about 75 per 
cent expected to ve back at work within 30 days of the start of their 
absence and 82 per cent within 60 days. The proportion whose anti- 
cipated absence exceeded 60 days was largest among those not working 
because of bad weather or for miscellaneous reasons, probably because 
both these groups include many seasonal workers who ordinarily do 
not work during the winter months. The rules specify that seasonal 
workers should not be reported as having jobs during the off season, and 
some need for tightening this provision may be indicated by the results. 
Absences of substantial duration because of illness were also reported 
in a number of cases, suggesting some need for clarifying the concept 
of “temporary” illness.® 

A further aid in evaluating the concept would be information on the 
specific nature of the job arrangements of persons in the “with a job” 
category. Information of this type may be obtained in the future as a 
byproduct of a system of quality control now being developed by the 
Bureau. For this purpose, a highly detailed questionnaire may be used 
by the field supervisory staff in rechecking the survey results. 


DIVISION BETWEEN THE UNEMPLOYED AND PERSONS NOT IN THE 
LABOR FORCE 


The greatest amount of controversy over labor force concepts has 
involved the classification of persons who may frequently cross and 
recross the line of division between being definitely outside the labor 
force and being in the market for a job. Some consumers would prefer 
the inclusion as labor force members of all persons willing and able to 
work; others would restrict the concept to those actually working or 
intensively seeking jobs. The concepts adopted in 1940 sought to steer 
a middle course, establishing some standards of objectivity, without 
imposing severe restrictions based on intensity of job-seeking efforts 
and similar criteria and with some recognition of special circumstances 





* The results are under review by an interagency committee re-examining labor-force concepts. 





LAPOR FORCE MEASUREMENTS AND DATA 685 


Although the concepts represented a distinct advance from the 
standards used previously, experience in their application has demon- 
strated that there are many borderline situations which continue to be 
troublesome, if not insoluble. It has been suggested that the notion of a 
distinct line of demarcation between workers and nonworkers may be 
fallacious in itself, and that we merely have a continuum reflecting 
different degrees of association or lack of association with the labor 
market. 

The Bureau has conducted a number of experiments designed to 
measure the size and composition of the group on the fringe of the 
labor market which, under current practices, is excluded from the labor 
foree count. Although the approach has varied, the usual procedure 
has been to identify those persons originally reported as nonworkers 
who had worked or looked for work within the previous month or two, 
and to establish either the reason for their current inactivity or 
whether they still desire employment.’ The results have shown that 
there is a small group—possibly ranging between 300,000 and 500,000 
—who resemble the “inactive” unemployed in some respects or who, 
without any serious departure from current concepts, could reasonably 
be considered as labor force members. This group consists largely of 
housewives and teen-age boys and girls and appears to increase slightly 
but not strikingly during periods of business downturn. Unfortunately, 
there have been no measurements of this type within the past four 
years; and some current checks, hopefully using a more imaginative 
approach, are needed. 


USE OF GROSS CHANGE DATA AS TOOL OF ANALYSIS 


The economic significance of changes in employment and unem- 
ployment may sometimes be obscured because of the existence of a 
large group of part-time and intermittent workers who move into and 
out of the labor force with considerable frequency in the course of a 
year. Seasonal adjustment of the series alleviates the problem to some 
extent; but such adjustments are at best only approximations. The 
availability of data by age, sex, and other characteristics is a sub- 
stantial aid, although demographic groups, of course, are far from 
homogeneous. 

An important advance in labor force analysis—which has both 
identified the magnitude of this problem and pointed to a partial 
solution—is the development of information on “gross changes” in 





7 For summary of these studies, see U. S. Bureau of the Census, Labor Force Memoranda, Nos. 3 
and 4, 





686 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


employment and unemployment. About 75 per cent of our sample js 
common from month to month, thus providing a basis not only for 
determining status at any given time but also for examining changes in 
status for an identical group of persons. These tabulations show that, 
regardless of how small net changes may be from month to month, 
millions enter or leave the labor force or shift between an employed 
and an unemployed status.® 

By these means it is possible to isolate, to some extent, changes of 
prime economic significance from those of perhaps secondary impor- 
tance. When unemployment is rising, for example, one may determine 
the extent to which the increase is due to job losses and the amount 
attributable to the entry of housewives, students, and others into the 
labor market in search of jobs. Conversely, when unemployment is 
dropping, job accessions and withdrawals from the labor market can 
be separately identified. Even where there is little apparent over-all 
change, gross movements may reveal certain significant underlying 
developments. 

These data have been highly useful in interpreting month-to-month 
trends as well as in illustrating the dynamic nature of labor market 
activity. It is hoped that we may be able to expand the tabulations to 
cover changes in hours worked and other special subjects in addition 
to employment status. 

Although gross change tabulations to date have been confined to 
month-to-month comparisons, it will shortly become possible to extend 
this type of analysis to year-to-year changes. Time and space do not 
permit coverage of all the analytical possibilities that could be exploited 
by these means. Since year-to-year differences usually represent more 
permanent changes in status, it is clear that such matters as the pat- 
tern of entry into, and retirement from, the labor force, basic occupa- 
tional and industrial shifts, and many related subjects could be 
thoroughly explored for perhaps the first time. Both on an annual anda 
monthly basis, there also remains the relatively untapped area of the 
motivation for these changes in status, which presumably could be 
pursued through supplementary questions. Here again, however, much 
experimentation would be needed in order to obtain meaningful results. 


RECONCILIATION OF CENSUS SERIES WITH OTHER GOVERNMENT SERIES 


In addition to the Census figures on the total numbers employed 
and unemployed, there are three other current government series in 





* The magnitude of these gross movements may be exaggerated somewhat because of some incon- 
sistency in reporting for identical persons from one month to the next where no change in status has 





LABOR FORCE MEASUREMENTS AND DATA 687 


the same general field; the Bureau of Labor Statistics (BLS) data on 
employees of non-agricultural establishments; the Agriculture Market- 
ing Service (AMS) data on farm employment, and the Bureau of 
Employment Security (BES) data on insured unemployment. It is 
generally agreed that the series serve different purposes and thus 
supplement one another in a very important fashion. Perhaps the best 
indication of their joint usefulness is the fact that three of the four 
series are being issued currently in a combined release by the Depart- 
ments of Labor and Commerce. 

The differences between the Census series and the other three can 
be explained in considerable part by technical considerations connected 
with the fact that the Census series is based upon information collected 
directly from households, whereas information from the other three 
series is provided by employers. The major conceptual difference 
between the Census and the other employment series arises from the 
treatment of workers with more than one job in the reporting period. 
Both the BLS and AMS statistics are collected from employers and 
essentially represent a count of different jobs, whereas the Census 
counts a person only once—in his major job—regardless of the number 
of jobs held. Another major difference from the BLS series lies in the 
treatment of persons with a job but not at work; BLS counts these 
persons only if they received pay for the reporting period. Comparisons 
between Census and AMS are also affected considerably by the in- 
clusion of children under 14 years of age in the latter series. 

Tests which have been applied to the Census series to measure the 
size of the conceptual differences between Census and BLS data have 
given some information on the number of dual job holders and on the 
numbers of persons on unpaid absences who would be excluded from 
the BLS series if payroll data from companies do exclude all persons 
receiving no pay in a particular payroll period.* The conclusion emerg- 
ing from these studies is that conceptual differences are definitely not 
the only and perhaps not even the major problem in this field and that 
future efforts at reconciliation should give at least equal attention to 
sampling and measurement techniques. It is hoped that after necessary 
studies of sampling and other errors in both series have been completed, 
there may be a better understanding of the relationships between them, 
and their joint usefulness may be increased. 





actually occurred. Generally, the net of gross movements between two categories will be more accurate 
than the gross movements themselves, although even the latter, in the case of unemployment for exam- 
ple, are reasonably consistent with data from other sources. 

® Some consideration is being given to obtaining pay status on a regular or at least frequent basis 
for persons with jobs but not at work during the survey week. 





688 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


In the case of unemployment, the main emphasis has been upon the 
development of adjustment factors which would compensate for differ. 
ences in coverage between the Census estimates and the BES figures 
on insured unemployment. This procedure has achieved fairly satis. 
factory results.'° The problem of reconciliation here is most trouble. 
some at a time of large-scale withdrawals from the labor force, when 
large numbers of women may be ready to continue the jobs which they 
held during a period of emergency but are not seeking any other work, 


SUMMARY 


In summary, the past 12 years have involved a great amount of re- 
search under committeee auspices to sharpen labor force concepts, to 
develop better procedures for collecting the data, and to provide many 
types of supplementary information. Advances in methods, involving 
the most modern techniques of sampling and of electronic tabulation, 
have also been noteworthy. That much remains to be done is very 
clearly shown in the report of the Special Advisory Committee on 
Employment Statistics appointed by the Secretary of Commerce, 
and in the coming months we feel sure that progress will be made along 
the following lines: 

1. Intensive study of concepts with the guidance of both Govern- 

ment and private advisory groups. 

2. Instensive research on questionnaires, forms, etc., to see what 
advances in objectivity and reductions in non-sampling errors 
can be made. 

3. Greater emphasis on quality control procedures to insure main- 
tenance of proper standards of performance. 





10 See Comparison of Census Bureau Estimates of Unemployment and Insured Unemployment Statis- 
tics, a statement submitted by the Bureau of the Census and the Bureau of Employment Security to the 
Joint Congressional Committee on the Economic Report, February 1954. 

ut “The Measurement of Employment and Unemployment by the Bureau of the Census in its Cur- 
rent Population Survey,” Report of the Special Advisory Committee on Employment Statistics, August, 
1954. 


so a > oe 2 of etl CUlUC DCO 





‘ 


EXAMINATION OF TWO SOURCES OF ERROR IN THE 
ESTIMATION OF NET INTERNAL MIGRATION* 


DaniE. O. PRICE 
University of North Carolina 


Two sources of errors in the estimation of net internal mi- 
gration are examined. First, errors arising from the use of a 
single set of survival rates in all 48 states are examined by 
estimating net migration in ten states using both a national 
life table and state life tables. It is estimated that the median 
error arising from this source is about 14 per cent of the esti- 
mate of net migration. 

The second source of error examined is underenumeration 
of the population. The “built in” correction factor of Census 
survival rates is demonstrated algebraically, and one ap- 
proach is made to estimating the magnitude of errors from 
this source when the assumptions are not justified. It is esti- 
mated that about one-third of the estimates of net migration 
are in error by 25 per cent or more due to the effects of under- 
enumeration. 

These two estimates of error are quite rough and the main 
conclusion to be derived is that small relative differences in 
estimates of net migration should be interpreted with extreme 
caution. 


GREAT many man-hours have been spent in measuring—or rather 
A in estimating—net migration for various areas and population 
groups. The methods used are generally familiar, that is, the natural 
increase method and the survival rate or residual methods, survival 
rates being computed either from life tables or Census figures and either 
the forward, reverse, or combined procedure being used [2]. In both 
the natural increase method and the survival rate method an expected 
population is computed and compared with an observed population 
and the difference is attributed to migration. This difference which is 
attributed to migration, however, is really the result of several factors. 
In nearly all cases migration is the largest single factor present so it is 
proper to think of the other factors as causing errors in the estimate of 
migration. We would like to know the relative importance of these 
various sources of errors and would like to be able to say that for a 
certain method of estimating net migration the standard deviation of 
the errors is a certain amount, say, 10 per cent. That is, we could say 
that about two-thirds of our estimates were within plus or minus 10 


* Revision of a paper presented at the Annual Meeting of the Population Association of America, 
Charlottesville, Virginia, May 8, 1954. 





689 





690 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


per cent of the true value. Given such an estimate of the variance of 
the errors we would like to be able then to break up this variance into 
its component parts somewhat as follows: 


Paws Ot Pf+--++? (1) 


Where 


o” is the total error variance 

u? is the amount of the total error variance due to errors of under. 
enumeration 

s? is the amount of the total error variance due to errors in survival 
rates 

f? is the amount of the total error variance due to using the forward 
method rather than some other, etc., and 

é is a residual amount of unexplained error 


If it were possible to assign values to each term in equation (1) we 
would then have an objective basis for planning further research to in- 
prove the accuracy of estimates of net migration. That is, it would seem 
logical to attack that source of error which contributed most to the 
total error variance. For example, if 90 per cent of the total error vari- 
ance came from errors in survival rates, then this would be the logical 
point on which to work for improvement rather than on some other 
source of error that accounted for only five per cent of the total error 
variance. This ideal, however, isn’t attainable. 

In the first place, the model given in equation (1) is oversimplified 
because it assumes that the sources of error are independent of each 
other. This is a simplifying assumption and may not be reasonable. 
It is imaginable that those areas where underenumeration is worst 
might be the same areas for which the survival rates are most in error. 
If these sources of error are correlated, then we must add a cross- 
product term involving the correlation coefficient between these two 
sources of error. If two sources of error are negatively correlated, they 
tend to compensate for each other and the total error is reduced; but 
if they are positively correlated, the total error is increased. 

In addition to the fact that equation (1) is oversimplified, we are not 
able to evaluate the terms of it, much less determine the amount and 
direction of correlation between the sources of error. However, it is felt 
that efforts to evaluate the terms of (1) or a modified version of it are 
worth the attention of demographers. The balance of this paper is 4 
report of efforts to evaulate or show the importance of two sources of 
error in estimating net migration by the survival rate method. 


foe beet Oy Ot wt Ow Ot Pet oe hm CI 





SOURCES OF ERROR IN ESTIMATING MIGRATION 691 


Let us first look at the effect of ignoring regional differentials in 
mortality. Whenever we use the same set of survival rates for all states 
we are ignoring differences in rates of mortality from one state to an- 
other. For example, we assume that the average rate of mortality for 
the 20-24 age group holds for all states. When we apply this average 
rate of mortality in a state that actually has a higher rate of mortality, 
we expect more survivors at the end of the decade than there really are, 
and we attribute the deficit to out-migration. The reverse happens in a 

state with the mortality rate lower than average. 
| In order to evaluate the magnitude of errors arising from this source, 
we have computed estimates of net migration for ten states by two dif- 
ferent methods. The ten sample states were selected by taking every 
fifth state from a regional listing of the 48 states. For each of these 
ten states net migration of native white males was estimated using 
survival rates taken from a United States life table for white males 
(3], the same set of survival rates being used for all ten states. These 
estimates of net migration are shown in column 1 of Table 1. Estimates 
of net migration were then made using survival rates taken from indi- 
vidual state life tables [3]. These estimates are shown in column 2 
of Table 1 and take into consideration state differences in rates of 


mortality. 
TABLE 1 


ESTIMATES OF ERRORS DUE TO USING ASINGLE SET OF SURVIVAL 
RATES IN THE ESTIMATION OF NET INTERNAL MIGRATION 
OF NATIVE WHITE MALES IN TEN SAMPLE STATES, 
1930-1940 








Estimated Net Migration Using Error as 


Survival Rates From 





U. S. life 
table, 1930 
(1) 


State life 
tables, 1930 
(2) 


U. S. 
Census 


(3) 


Estimated 
Error 


(2) —(1) 
(4) 


Per Cent 
of Net 
Migration 
(4) +(3) 
(5) 





7,790 
—39,425 
106 ,859 
—15,159 
— 6,383 
— 3,566 

8,200 
— 5,561 
—94,314 

2,574 








15,553 
—43,317 
108,721 
— 9,522 
—12,167 
— 3,360 

9,235 
— 6,426 
—71,472 

3,225 





6,375 

— 44,302 
102,672 
— 38,097 
— 20,405 
— 4,878 
7,854 

— 13,541 
— 122,307 
911 





7,763 
3,892 
1,862 
5,637 
5,784 
206 
1,035 
865 
22,842 
651 





_ 
i] 


1 
SK DOOwr Oh Oe 


ay 
or A1 mb bo bo 0G 00 00 & 


ae 








692 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


In getting these two estimates of net migration between 1930 and 
1940, life tables for 1930 were used. It would have been desirable to 
use an average of the 1930 and 1940 life tables. However, our interest is 
in the difference between the two methods, and it is assumed that the 
difference between the two methods would be relatively constant re- 
gardless of whether 1930 life tables were used, 1940 life tables were 
used, or an average of the two. There is some reason to believe that 
state differences in mortality are decreasing, and if this is true, errors 
arising from this source will decrease. 

It is not felt that life table survival rates give as good estimates of 
net migration as Census survival rates [1], therefore column 3 of Table 
1 gives estimates of net migration for the ten sample states computed 
using Census survival rates. We assume that the differences between 
corresponding figures in columns 1 and 2 are good estimates of the errors 
arising from the use of a single set of survival rates, and these differences 
are shown in column 4. We can express the absolute values of these 
errors as percentages of the number of migrants, and these percentages 
are shown in column 5. 

This method of expressing the error (as a percentage of the net migra- 
tion) can, theoretically, lead to division by zero in the case where there 
is no net migration. However, none of the 48 states actually has zero 
net migration and the use of positional summarizing measures will tend 
to offset the effect of any extreme values arising from a small net migra- 
tion. One might expect that the largest percentage errors would tend 
to occur in those states having the smallest amount of net migration, 
but a rank order correlation gives little indication of such an association. 
(Kendall’s Tau for the sample is .24. The probability of a sample Tau 
being equal to or larger than this when there is no association in the 
universe is about .32.) 

Utilizing the percentage errors of column 5 in Table 1 it is possible 
to make an estimate of the median percentage error and to set confi- 
dence limits on this estimate, the confidence limits being set without 
any assumptions as to the form of the distribution [4]. Our estimate 
of the median percentage error is 14.0 per cent and the 89 per cent 
confidence limits are 6.4 per cent and 28.3 per cent. Some critics might 
object to using Census survival rate estimates of net migration as 4 
base when the errors are based on life table survival rates. The estimates 
of net migration based on state life tables can be used as the base in 
computing the percentage error. Such computations lead to an esti- 
mated median error of 16.8 per cent with 89 per cent confidence limits 
of 9.0 per cent and 47.5 per cent. 





SOURCES OF ERROR IN ESTIMATING MIGRATION 693 


These findings lead us to the conclusion that half of cur estimates of 
net migration of white males are in error by more than 14 per cent. 
At least it would seem safe to say that about half of our estimates of 
net migration include an error greater than a figure of the order of 10 to 
20 per cent. 

We must keep in mind that these figures are only for white malcs 
during the 1930-1940 decade and are rough estimates at best. Whether 
similar results would be gotten with other population groups and in 
other decades is unknown. The important conclusion to be derived 
is that the errors resulting from using a single set of survival rates for 
all states are not negligible. This finding also implies that most present 
estimates of net migration which have been computed using a single 
set of survival rates for all states are by no means exact and due cau- 
tion should be exercised in comparisons and interpretations. 

We must also remember that all of our findings here have been on the 
basis of total net migration rather than net migration by age groups. 
The errors in the estimation of net migration by age groups are possibly 
larger than in the estimation of total net migration; this fact should be 
kept in mind in using figures on net migration by age. 

Another important source of error in estimates of net migration is 
underenumeration of population. As was mentioned above, we usually 
estimate net migration by computing an expected population, compar- 
ing it with an observed population, and assuming that the difference is 
due to migration. A good deal of attention has been given to methods of 
computing the expected population, but little attention has been given 
to the effects of errors in enumeration of the observed population. The 
second part of this paper makes an effort to examine the relative im- 
portance of underenumeration as a source of errors in estimation of 
net migration. The term underenumeration is used here to include mis- 
statements of age and other misclassifications since from a practical 
point of view it is impossible to separate them. For example, if we 
get an undercount in a particular age group, there is no way to know 
how much of this undercount is due to people who were not counted 
at all and how much of it is due to people who should have been counted 
in this group but were counted in some other age group. Because of 
this misclassification by age and other categories it is possible to get an 
overcount in an age group, but we will use the general term under- 
enumeration to include all of these situations. 

Information on underenumeration is not available in sufficient detail 
for us to estimate empirically the magnitude of the errors from this 
source as was done above for survival rates. It is possible, however, 





694 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


to demonstrate algebraically the part that underenumeration plays jy 
estimates of net migration. If we let 


A=U.S. native population in age group z at end of decade 

B=U. 8. native population in age group z—10 at beginning of 
decade 

C=state native population in age group z—10 at beginning of 
decade 

D=state native population in age group z at end of decade 


A/B = survival rate, R 


Net migration in state 7 is 


A 
M;=D;-—C; 
B 


A, B, C, and D are measured with errors a, b, c, and d so that we know 
only 


(A — a), (B — b), ete., or 


1(0-4). 06-9) (9) 06-4 


Our estimate of net migration in state 7 is 


We can look at this equation by parts in order to see the effect of 
various sorts of underenumeration. Let us first look at the middle of 
the second term, that is 

G 
(-4 
A 


(-3) 
© am 
B 
This represents the amount of error in the survival rate. If the under- 
enumeration in the United States population in the age group at the 


beginning of the decade is the same as the underenumeration in the 
same cohort at the end of the decade, then there is no error in the sur- 


ae mem es at a oe Ue 





s0URCES OF ERROR IN ESTIMATING MIGRATION 695 


vival rate from this source. However, the rationale in using Census 
survival rates for estimating net migration is not that (a/A) =(b/B) 
but that the undercount or overcount of an age group in a particular 
state will be approximately the same as the undercount or over- 
count of the corresponding group in the U. 8. Put into symbols this is 


(G)-(G) >: 
(3) =) 


If these conditions hold, then formula (3) becomes 


A 
M; = (p = = c,)it _ e) 


This demonstrates the “built in correction” of Census survival rates. 
However, even when our assumptions hold, the estimate of net migra- 
tion is in error by the same percentage as the underenumeration of the 
cohort at the end of the decade. This is an error in the estimation of the 
absolute number of net migrants. Horace Hamilton has pointed out to 
the writer, however, that when the number of net migrants is related 
to the population at the end of the decade in order to get a “migration 
rate,” the error cancels out and the rate is correct. Dr. Hamilton has 
also demonstrated that if a reverse solution, or “revival method,” 
is used, the error still cancels out when a rate is computed, 

The real problem arises in an attempt to estimate the extent to which 
the underenumeration of an age group in a state departs from the under- 
enumeration in the corresponding age group in the entire United 
States. The logical approach to this problem is through the study of the 
variation among states in underenumeration for particular age groups. 
We do not, however, have complete information on underenumeration 
by age and by states. (If we had this, of course, we could adjust our 
estimates of net migration and avoid this source of error.) The first 
approach to this problem was to assume that the coefficient of variation 
of underenumeration would be relatively constant for most age groups. 
This is to say that the greater the underenumeration of an age group for 
the total United States, the greater will be the variations of states 
about this average underenumeration, and that the ratio of the stand- 
ard deviation of the variations to the average underenumeration will 
be approximately constant from one age group to another. It is possible 
to make some tests of this hypothesis utilizing scattered data on under- 





696 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1935 


enumeration. The results of such tests using the available data are 
shown in Table 2. 

An examination of this Table does not support the assumption. out- 
lined above. Three out of the four groups, however, have approximately 
the same standard deviation regardless of the average level of under- 
enumeration. The fourth group, Negro males of draft age, has a much 
larger standard deviation but this might be a function of the fact that 


TABLE 2 
MEANS AND STANDARD DEVIATIONS OF PER CENT 
UNDERENUMERATION BY STATES FOR FOUR 
POPULATION GROUPS 








Mean Per Cent 
Population Group Underenumera- 
tion 


Standard No. of States 
Deviation Covered 





Whites under 5 years of age, 
1940* 6.10 ; 47 
Nonwhite under 5 years of 
age, 1940* 15.42 ; 16 
Males 21-35 years of age in- 
clusive, 1940f 2.00 2.83 48 
Negro males 21-35 years of 
age inclusive, 1940+ 15.87 7.08 23 














* Sixteenth Census of the United States, 1940. Population. Differential Fertility, 1940 and 1910. 
Table A-2. 

t Daniel O. Price, ‘“‘A check on underenumeration in the 1940 census,” American Sociological Re- 
view, XII (February 1947), 44-9. 


the 23 states used in computing this figure were those with the largest 
Negro populations. Since this is not necessarily the case, however, the 
figures of Table 2 do not give unqualified support to the hypothesis that 
the standard deviation of underenumeration is a constant. 

Let us assume, however, that the three smaller standard deviations of 
Table 2 give us some basis for estimating the standard deviation of 
errors of underenumeration among states. Using this assumption, we 
can estimate the effects of errors of underenumeration. We begin by 
making the following substitutions in equation (3). 


ao 
«= (1-4) 
D; 

Ci 
c= (1-— 
a= (1-2) 





out- 


der- 
uch 
hat 


sOURCES OF ERROR IN ESTIMATING MIGRATION 697 


a 
_- 
A: 
K = ————_ 
b% 
2 oa 
B 
A 
———s 
B 


Then our estimate of net migration is 
M = Dja; — RKC #8; 


Where 
M;=D;—-— RC; 


The error due to underenumeration is 
M,; — Mi 


We express this error as a proportion of the population at the end 
of the decade thus. 

M;-— M/ 
E; = ———_- 
D; 

We assume that this error is uncorrelated with the amount of migra- 
tion and therefore wish to determine the standard deviation of E and 
relate it to the mean proportion of the population classed as migrants 
and get a coefficient of variation. 


(D; — RC.) — (Dias — RKCB:) 





i= D. 
= (1 — a,) rm i(+ 4 
“ae = (= ‘ &:) (4) 


We could substitute typical values of a, 8, R, K, C, and D in this for- 
mula and make an estimate of the average value of EZ; but it seems 
more useful instead to estimate the variation in EZ, and this we will 
proceed to do. 

The ratio C,/D, is the ratio of the population aged z—16 in a state 
at the beginning of a decade to the number left in a state in the same 
cohort at the end of the decade. Without migration this term will 
always be less than 1.0, but with migration it will vary over a range 
from about .8 to 1.1 for those age groups most affected by migration. 











698 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1985 





Let us call this ratio P; and assume that it has a standard deviation of 
.06 since this is consistent with the range assumed above. 

The term (1—a,) is simply d;/D; or the underenumeration of D; 
expressed as a proportion of the population. Let us call this term 6 
Table 2 gives us reason to assume that the standard deviation of this 
term is about .027. Let us assume a normal distribution. 

The term [(1/K)—8;] has the same variation as 8; which has the 
same variation as c;/C;. Let us call this term 7;. We are assuming that 
the variation of c;/C; is the same as d;/D; and that c;/C; is probably 
correlated with d;/D;. We are also assuming that both of these terms 
are independent of C;/D;. We can now write that 


E; = 6; — RKPw:i 
The variance of E, o?(£), can be written 
o°(E) = o7(6;) + RKo*(Pyyi) — 2RKro(6;)o(Piy;) 


where r is the correlation between 6; and (Pyy;). Since P; and y; are 
independent, this can be written as 


o°(Z) = o7(6;) + RKo*(P,)o?(y) — 2RKro(6,;)o(P;)o(y;) (5) 


We now have the variance of the errors due to underenumeration 
expressed in terms of the variance of the individual components and 
all that remains is to evaluate the expression. Substituting the assumed 
values in expression (5) gives us 


o°(E) = .000729 + RK(.0036) (.000729) — 2 RKr (.027) (.06) (.027) 
= .000729 + RK (.000006262 — 2r .00004374) (6) 


We can assume various values for R, K, and r and evaluate equation 
(6). For most age groups the true survival rate, 2, will be between .85 
and .95. It also seems reasonable to assume that the ratio, K, of the 
computed survival rate to the true survival rate would be between 
.9 and 1.1. The correlation between 6; and Py; is probably between 0 
and 1. If o(£) is computed for all combinations of R, K, and r within 
the ranges specified above, it varies only from .025 to .027. Thus we 
have an estimate of the standard deviation of the errors arising from 
underenumeration. 

From this we can estimate that about one-third of the errors in esti- 
mates of net migration arising from underenumeration are more than 
24 per cent of the population in the state at the end of the decade. 
Actually the errors are larger than this because this is only the variation 
of the errors about some average error. However, if we make the as- 































































































SOURCES OF ERROR IN ESTIMATING MIGRATION 699 


sumption that the average error is zero we still have errors of the magni- 
tude described above. 

If we relate this estimated standard deviation of errors to the mean 
net migration, we can get a coefficient of variation. However, we must 
remember that these estimates of underenumeration are based on age 
groups and therefore we must relate this estimated standard deviation 
to mean net migration by age. Since one of our estimates of under- 
enumeration comes from the 21-35 age group, let us relate our estimate 
of error to the mean net migration of the 20-35 age group. This age 
group has one of the highest rates of net migration; therefore, we will 
tend to get an estimate of minimum error. During the 1930—40 decade 
the net migration in the 20-35 age group was between 9 and 10 per cent 
of the 1940 population for those states having net out-migration as well 
as for those states having net in-migration in this age group. If we use 
10 per cent as our estimate of the mean net migration in this age group, 
the coefficient of variation is 


2.5 
C. V. = — = 25 per cent 
10 


This suggests that about one-third of our estimates of net migration 
(by the survival rate method) for specific age groups are in error by 
more than 25 per cent due solely to the effects of underenumeration. 

If our results from the examination of these two sources of error are 
approximately correct, we see that the errors arising from underenumer- 
ation are at least as important as errors arising from the use of a single 
set of survival rates for ail states. Thus in addition to improving esti- 
mates of survival rates, we should attempt to devise ways to estimate 
underenumeration and correct for the errors introduced. The lack of 
data and the difficulty of obtaining the necessary data complicates the 
task; but if estimates of net migration are useful, surely ways can be 
devised to make adjustments for errors arising from these sources. 

In conclusion we might look briefly at the two sources of error 
together—that is, errors arising from the use of a single set of survival 
rates for all states and erorrs arising from underenumeration. Are errors 
from these two sources correlated? If they are negatively correlated, 
then they tend to act in such a way as to reduce the total errors of esti- 
mate of net migration. On the other hand, if they are not correlated, 
the variance of the errors from the two sources will be additive. If they 
should be positively correlated, the total error variance will be greater 
than the sum of the error variances from the two sources. There seems 





700 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


to be little reason why these two sources of error should be correlated 
at all, although it might be possible to make a logical case for a posi- 
tive correlation between them. At any rate we can be fairly certain 
that the methods most frequently used to estimate net migration give 
us estimates that are likely to include an appreciable percentage error. 
For this reason caution must be exercised in interpreting such figures 
and small relative differences between states and decades should not be 
taken too seriously. 


REFERENCES 


{1] Hamilton, C. Horace, and Henderson, F. M., “Use of the survival rate 
method in measuring net migration,” Journal of the American Statistical 
Association, 39 (June 1944), 197-206. 

[2] Siegel, Jacob S., and Hamilton, C. Horace, “Some considerations in the use 
of the residual method of estimating net migration,” Journal of the American 
Statistical Association, 47 (September 1952), 475-500. 

[3] U. S. National Resources Committee, Population Statistics, Part 2 State 
Data. Washington: U. S. Government Printing Office, 1937. 

[4] Walker, Helen M., and Lev, Joseph, Statistical Inference. New York: Henry 
Holt and Co., Inc., 1953, 440. 





THE REDESIGN OF THE CENSUS CURRENT 
POPULATION SURVEY 


Morais H. Hansen, WituraM N. Hurwitz, 
HAROLD NISSELSON, AND JOSEPH STEINBERG* 
Bureau of the Census 

In February 1954 a redesign of the Current Population 
Survey was introduced that provided for a more efficient sys- 
tem of field organization and supervision as well as some ad- 
vances in methods. The sample is now spread over 230 areas 
instead of 68 areas with the same number of households as 
heretofore. A composite estimation procedure has been intro- 
duced which reduces the sampling variability for most esti- 
mates. Also, there has been a considerable reduction in the 
variance of variance estimates made from the sample. A sta- 
tistical quality control program has been introduced to help 
insure results of consistently acceptable quality. Problems 
arose in the process of shifting from one design to the other 
that resulted in some significant differences between the new 
and old samples for a few of the estimates, especially unem- 
ployment. Apparently response errors were the principal 
source of difficulty, and it was possible to take steps to bring 
the results within sampling error range. Work on the measure- 
ment and control of response errors is being expanded. 


INTRODUCTION 


N FEBRUARY 1954, the Bureau of the Census introduced a redesign of 
Current Population Survey, from which information on employ- 
ment, unemployment, and related data are compiled each month. 
Information on other topics, such as income distribution, family char- 
acteristics, marital status, migration, and education, are compiled less 
frequently. Since 1943, the estimates had been made from a sample of 
households in 68 primary sampling units spread throughout the United 
States. The main features of this sample design, which has been in 
operation for more than a decade, have been described elsewhere;! 
they will not be reviewed here, except where it is necessary to point up 
some of the principal changes introduced. It is to be noted, however, 
that the sample was set up as a general purpose sample, and the 
sample of areas is used for a monthly retail trade survey and numerous 
special surveys as well as for the monthly labor force measurements. 





* The work reported was the joint work of sampling and other staff members in the Bureau of the 
Census. This paper was presented at the Annual Meeting of the American Statistical Association at 
Montreal in September 1954. 

1 Hansen, Morris H., Hurwitz, William N., and Madow, William G.; Sample Survey Methods and 
Theory, Vol. I, Chapter 12 B (prepared by J. Steinberg), New York: John Wiley and Sons, 1953, 559-82. 


701 





702 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


In summary, the redesign provides for a more efficient system of 
field organization and supervision, takes advantage of a more wide- 
spread sample that is made feasible when the requirement of a full- 
time supervisor in each sample area (which was accepted as a condition 
in the original design) is removed, and introduces some other advances 
in methods. As a result it is possible to provide more information per 
unit of cost, to increase the accuracy of published statistics, and per- 
haps to make regional estimates for summary characteristics. Also, 
variance estimates made from the sample are substantially improved. 

The main features of the new design that differ from the old are that 
the sample is now spread over 230 instead of 68 primary sampling 
units but with approximately the same total number of households 
included in the sample ;? the supervisory organization has been changed, 
the philosophy and methods of control of the field work have been 
modified, and a new estimation procedure that takes advantage of a 
rotating sample in estimating time series has been introduced. Also, the 
tabulations and estimates are now prepared on a high-speed electronic 
computer, the Univac. 

The fundamental principles of optimum design of a survey with 
multistage probability sampling are the same in the two surveys, but 
the redesigned survey incorporates some advancements in the applica- 
tion of the principles. The over-all guiding considerations in the design 
of a sample survey are to select the sample and carry through the col- 
lection, tabulating, and estimating operations in such a way as to pro- 
vide the desired information of the required accuracy (taking account 
of both sampling and nonsampling errors) within the necessary time 
limits and at near the minimm possible cost. We have also con- 
sidered it important to be able to estimate the sampling errors of the 
estimates from the sample itself, and also, to the extent feasible, to 
estimate the nonsampling errors. 

The principal new features of the survey design and their implica- 
tions are discussed more fully in the material that follows. 





2 A total of 24,000 to 26,000 dwelling units or other living quarters are designated for inclusion 
in the sample each month. Completed interviews are obtained from about 20,000 to 22,000 households. 
Of the remainder, about 500 to 1,000 are households from which information should be collected but is 
not because the respondents are not found at home after repeated calls, or are temporarily absent, or are 
unable to be interviewed for other reasons. The other 2,500 to 3,500 designated units represent those 
found to be vacant, or occupied by persons with residence elsewhere, or otherwise not to be enumerated. 
The over-all sample size varies over time partly because of chance but also because of the growth of the 
population and the creation of new households. Every 2 or 3 years as the sample expands with popula- 
tion growth, it is necessary to decrease the sampling ratios slightly in order to keep the workload at the 
average of roughly 25,000 designated units prescribed under the survey budget. 





REDESIGN OF CENSUS SURVEY 703 


PRINCIPAL INNOVATIONS IN THE NEW SURVEY DESIGN 


Supervisory organization and its relationship to optimum design. In 
the original design of the 68-area sample « condition was imposed that 
a district office should be located in every primary sampling unit in- 
cluded in the sample. An office was staffed with a supervisor, at least 
one full-time clerical assistant, and, in a few of the offices, with addi- 
tional supervisory and clerical personnel. 

The requirement of a full-time supervisor in each p.s.u. was imposed 
in an effort to insure that the work was carried out substantially in 
accordance with the specifications, not only in order to have a valid 
probability sample, but particularly in order to control response or 
measurement errors. It was felt that the proximity of the supervisors 
to the part-time interviewers would permit almost daily contact with 
the interviewers and therefore would provide effective means for 
supervisory control as well as a means for giving formal training to 
interviewers each month. With this restriction and with the available 
financial resources, a sample of about 21,000 interviewed households 
spread over 68 areas approximated the optimum design. 

If the supervisory restriction had not been imposed, the optimum 
design would have involved spreading the sample over a much larger 
number of primary sampling units, provided only that performance 
could be controlled satisfactorily so that response and other nonsam- 
pling errors would not be increased. 

More recently, evaluation of the one-supervisor-per-p.s.u. type of 
organization led us to the conclusion that we were paying a considerable 
overhead cost for each field office, and that we were not making as 
effective use as we might of the supervisory staff. Frequently, super- 
visors were doing interviewing and clerical operations because the 
supervisory aspects of the work did not constitute a full-time job. We 
concluded that the ratio of supervisory offices to interviewers was ex- 
cessive, and that by introducing appropriate statistical quality control 
principles in the control of the field work, a more effective organization 
could be established. Consequently, we decided to reduce the number 
of supervisory offices by about 50 per cent, with a smaller reduciion in 
rent, clerical staff, and other overhead costs associated with the offices. 
The supervisory personnel would be reduced only by about 25 per cent. 
This made it possible to retain a smaller number of higher-grade per- 
sons and from the savings in overhead, to pay for the additional travel 
involved in spreading the sample into a much larger number of primary 
sampling units. 





704 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


As before, this field organization is also used for the monthly retail 
trade survey, which is taken in the same areas, and for other field work. 

Sample selection. The expansion of the sample from 68 to 230 areas 
reduced markedly the sampling error of most of the important esti- 
mates of aggregates, and may make it possible to publish summary 
statistics for several geographic regions. The expansion of the number 
of p.s.u.’s reduced considerably the component of the variance due to 
the sample of p.s.u.’s, and made little or no change in the component 
of the variance due to the sample within p.s.u.’s. The between-p.s.u. 
variance term dominated the variance for most summary items in the 
original sample, and thus the gains through the reduction of this con- 
tribution are worthwhile. Table 1 shows approximate estimates of the 
between-p.s.u. and the within-p.s.u. contributions to the variance for 
the new sample for January 1954. For many items, the estimated be- 
tween-p.s.u. contribution to the variance for the 68-area design was 
more than five times as large as that for the 230-area design; the within- 
p.s.u. contribution for the 68-area design was about the same as for the 
230-area design. 

A particularly important gain from the increase in the number of 
p.s.u.’s is the increase in the precision of variance estimates. With the 
68-area sample, variance estimates made from the sample itself were 
subject to relatively wide sampling fluctuations because of a number of 
factors involved in stratified sampling from non-normal populations,’ 
and especially because the estimates were based on a small number of 
degrees of freedom. With the 230-area sample, the number of degrees 
of freedom has been increased sufficiently to yield variance estimates of 
much greater precision. 

In the new design, as in the old, each p.s.u. was defined to be as 
heterogeneous as feasible subject to a restriction on its total area. Thus 
each p.s.u. included one or a few adjoining counties. In the new sample, 
however, each standard metropolitan area either constitutes a self- 
representing area or a separate p.s.u. in order to facilitate making esti- 
mates for the aggregate of all metropolitan areas from the present 
sample, and to facilitate the expansion of the sample when data for 
separate metropolitan areas are desired.‘ 

More recent data were used in defining the strata into which the 
primary units were grouped although only the total population counts 





3 For s discussion of this problem, see Hansen, Morris H., Hurwitz, William N., and Madow, Wil- 
liam G.: Sample Survey Methods and Theory, New York: John Wiley and Sons, 1953, Vol. I, Chapter 10, 
431-35. 

‘Standard metropolitan areas were established subsequent to the original sample design. In the 
original design the self-representing p.s.u's approximated metropolitan districts, as then defined. 





REDESIGN OF CENSUS SURVEY 705 


TABLE 1 
ESTIMATED REL-VARIANCES (COEFFICIENTS OF VARIATION 
SQUARED) OF SELECTED LABOR FORCE ITEMS FOR 
230-AREA CPS SAMPLE, JANUARY 1954 








Between- Within- 
Total P.S.U. P.S.U. 


. Rel- | Contribu- | Contribu- 
ariance tion tion 


(*) (b?) (w?) 





Civilian labor force divided by popula- 
tion 14 years old and over .000031 .000010 -000021 
Agricultural employment divided by 
farm population 14 years old and 
.0019 .00026 .0016 
Nonagricultural employment divided 
by population 14 years old and over | .000057 .000019 .000038 
Unemployed divided by population 14 
years old and over .0015 -00036 .0012 














from the 1950 Census were available in time for use in stratifying the 
primary units. The primary strata of the 230-area design were made 
more nearly equal in size in terms of aggregate populations than in the 
68-area design. One primary unit was included in the sample from a 
stratum, as before, and the primary units included in the sample were 
selected with probabilities proportionate to population (the 1950 popu- 
lation in the case of the new sample). In the selection of the sample of 
primary units restrictions between strata were used to increase the 
geographic stratification of the sample,® that is, to decrease the proba- 
bility of obtaining geographically contiguous primary units. 

It is to be emphasized that drawing the sample on the basis of more 
recent information for stratification and sample selection may result 
in some reductions in sampling error, but redrawing the sample with 
up-to-date information is not essential in order to include units in the 
sample with known probabilities. An area probability sample drawn in 
1943 provides a probability sample of areas, and consequently of people, 
retail stores, or other units that can be associated with areas both in 
1943 and in 1954; moreover, such a sample would continue to provide 
a probability sample in future decades provided that the areas can be 
uniquely identified over such a period and if the existence of a sample 
in an area does not change the character of the area. 





5-Goodman, Roe, and Kish, Leslie, “Controlled selection—a technique in probability sampling,” 
Journal of the American Statistical Association, 45 (1950), 350-72, 











706 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 





Up-to-date information from censuses or other sources can be used 
in the selection of a new sample to provide a more efficient probability 
sample, but the principal gains from such information can also be made 
with an area sample drawn at an earlier time by appropriate introduc- 
tion of such new information in the estimation procedure. This, in 
fact, was done in making the estimates from the 68-area sample after 
the 1950 Census data became available. The use of more recent infor- 
mation in selecting a new sample may result in some further gains. 
However, a complete change in the sample would not have been worth 
the high cost of making the change unless the other gains to be achieved 
made the changeover worthwhile, as was the case in this particular 
instance. 

We emphasize these points because there has been some misunder- 
standing concerning them. Thus, it has sometimes been stated that use 
of 1940 Census data for stratification and for determining probabilities 
of selection invalidated the 68-area sample for subsequent periods. In 
fact, the principal factor that reduces the sampling error in the new 
230-area design as compared with the 68-area sample is the increase 
in the number of p.s.u.’s. For example, even though the 68-area sample 
was selected on the basis of 1940 Census data, and the 230-area sample 
used more recent information, we can estimate 1940 Census character- 
istics with a smaller variance from the 230-area sample. Table 2 com- 
pares some specific estimates for 1940 made from the 68-area sample 
(selected in 1943) and the 230-area sample (selected in 1952). 

The subsampling in the 230 areas was essentially the same as in the 
68-area sample, except that 1950 Census data were used in defining the 
sample segments. 

Rotation of the sample. One feature of the subsampling which is 
present in both the old and new designs has an important bearing on 
the estimation procedure introduced in the new sample. This feature 
involves changing a part of the sample each month to avoid a decline 
in respondent cooperation (which may happen when a constant panel 
is interviewed indefinitely) and to reduce the variances of sample 
estimates under certain circumstances. 

To accomplish this rotation, eight systematic subsamples (rotation 
groups) of segments are identified for each sample. A given rotation 
group is interviewed for a total of eight months, divided into two equal 
periods. It is in the sample for four consecutive months one year, leaves 
the sample during the following eight months, and then returns for the 
same four calendar months of the next year. It is then dropped from 
the sample. In any one month one-eighth of the sample segments are 
















































> REDESIGN OF CENSUS SURVEY 707 


TABLE 2 


LABOR FORCE STATISTICS FROM COMPLETE CENSUS COM- 
PARED WITH ESTIMATES* FROM 230-AREA AND 68-AREA SAM- 
PLES BASED ON COMPLETE CENSUS TOTALS FOR 
THE SAMPLE COUNTIES: APRIL 1940 


(in thousands) 


= 








— setae Complete 230-area 68-area 
mployment status census sample sample 





Total labor force 2,790 52,850 52,600 
Employed 45,170 45,180 45 ,060 
Farmers and farm managers. . .| 5,140 5,190 5,100 
Unemployed 7,620 7,670 7,540 
Not in labor force | 48,310 48 ,250 48 ,500 


’ 
’ 
’ 











* Involve approximations to the race-residences and age-sex ratio estimates actually used in CPS, 
based on 1940 Census data. The complete census data are not available in sufficient detail to make iden- 
tical ratio estimates possible. 


in their first month of enumeration, another eighth are in their second 
month, etc., with the last eighth in for the eighth time (the fourth 
month of the second period of enumeration). Under this system, 75 
per cent of the sample segments are common from month to month 
and 50 per cent are common from year to year (i.e., from one month 
to the same month a year earlier). 

Estimation procedure. The new estimation procedure makes use of 
what is referred to as a composite estimate. The estimate for each 
item is a composite or weighted average of two estimates. These two 
estimates of the same item are not independent but when properly 
weighted may yield a composite estimate with a smaller variance than 
either of the component estimates. 

(a) The first of these estimates, referred to as the regular ratio esti- 
mate, is obtained by essentially the same two-stage ratio estimating 
procedure that was used in making estimates from the 68-area sample. 
The first-stage ratio estimates take advantage of 1950 Census informa- 
tion for counties. The second-stage ratio estimates take advantage of 
current figures on the age-sex-color distribution of the civilian popula- 
tion of the United States. In effect, after the sample returns have been 
multiplied by the first-stage ratio estimate factors, they are used to 
estimate the percentage distribution by employment status within an 
age-sex-color group; the percentages are then applied to the known 
total population for that group. Thus the problem of estimating an 
aggregate (such as the total number of persons in the labor force) is 





708 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


reduced to one of estimating percentages, with a consequent reduction 
in the sampling error for most of the items. 

(b) The second estimate that enters into the composite estimate is 
the estimate for the preceding month to which has been added an 
estimate of the change in each item from the preceding month to the 
present month. The estimates from which the change is computed are 
made with the estimating procedure described in (a), but they involve 
for each month only the returns from the sample segments that are 
common to the 2 months (constituting 75 per cent of the sample). 

The weights for the two components of such a composite estimate 
need not necessarily be equal, and for estimating any particular item 
optimum weights might be chosen. In this instance the weights used 
are equal, because equal weights in this case satisfy the condition that 
for most items there will be some gain in reliability of the estimates 
over those obtained from the procedure described in (a). The composite 
estimate takes advantage of accumulated information from earlier 
samples as well as the information from the current sainple, and re- 
sults in smaller variances of estimates of both level and change for 
most items, but the larger gains are achieved, for the most part, in the 
estimates of change. 

The way in which the previous samples contribute to the current 
estimate can be seen if we write the composite estimate as: 


where 


z 
os 


, 
Zu, ul 


, 
TZ u-1, u 


ty = K(2u1 + uu — Let) + (1 — K)xe (1) 
Os Ks1 


is the composite estimate for month u 

is the regular ratio estimate based on the entire sample for 
month u, 

is the regular ratio estimate for month u but made from the 
returns from the segments that are included in the sample 
in both months u and u—1, and 

is the regular ratio estimate for the previous month (u—1) 
but made from the returns from the segments that are in- 
cluded in the sample in both months u and u—1. 


Now, if we let 
Yu = K(2u,u-1 _ Zo-2.0) + (1 = K) x. (2) 


a 





— = Yu + Kx, (3) 
yu + Kya + K*yu-2 + Key st -:- 


+ Ky, + Kyo 





REDESIGN OF CENSUS SURVEY 709 


where y¢ is defined as the initial estimate for the month preceding 
the month for which the composite estimate is first made. 

The variance of this composite estimate (under some simplifying 
assumptions) is given in the appendix. The between-p.s.u. variance is 
the same for the composite estimate as for the regular ratio estimate, 
because the same sample of primary units is used every month. The 
reduction in variance through the use of the composite estimate is 
entirely in the within-p.s.u. contribution to the variance. Because for 
many items the within-p.s.u. variance accounts for most of the total 
variance of month-to-month change, the gains on estimates of change 
are particularly worthwhile. For the rotation system described above 
and with some simplifying assumptions, Table 3 gives a few illustrative 
cases of changes in the within-p.s.u. variance when the composite esti- 
mate is applied. 

TABLE 3 
RATIO OF WITHIN-PSU VARIANCE OF COMPOSITE ESTIMATE 


TO THAT OF REGULAR RATIO ESTIMATE FOR SOME 
ALTERNATIVE ASSUMED CORRELATIONS* 











Values Of pf Ratio With K =.5} Ratio With K =.6{ 





Month-to- Month-to- 
Level month Level month 
change change 





0.95 0.90 0.85 . 0.44 ‘ 0.36 
0.90 0.85 0.80 ; 0.54 , 0.49 
0.70 0.65 0.60 . 0.80 . 0.79 
0.50 0.45 0.40 ‘ 0.94 ‘ 0.95 
0.00 0.00 0.00 . 1.10 , 1.14 























* Assumes rotation of segments in sample with a segment in the sample four months, out eight 
months, in again four months, as described earlier. The correlations between estimates 11, 12, and 13 
months apart are not given because they have a trivial effect on these estimates. 

t is the correlation one month apart of estimates from identical segments; 

o: is the correlation two months apart of estimates from identical segments; 
pris the correlation three months apart of estimates from identical segments. 

t Kis defined in Equation (1). 


Effect of length of time in sample on response. One interesting point 
that affects the estimation method deserves further attention. If there 
were no measurement errors involved, the value of any characteristic 
measured from the sample would be exactly the same, on the average, 
whether measured from a rotation group in the sample for the first time,,. 
or from one that had been in the sample for one or more earlier months. 
This appears to be the case for most characteristics measured in the 





710 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


labor force survey. For unemployment, however, there is evidence that 
households tend to report a somewhat higher unemployment rate the 
first month they are in the sample than in subsequent months. During 
1948-1952, the difference between unemployment estimated from 
these segments in the sample for the first month and the average esti- 
mate from those segments in the sample for the five subsequent months 
was 0.2 per cent of the population 14 years of age or over. This differ- 
ence did not appear to vary with differing levels of unemployment, 
and reflected about a 10-per cent difference between the average un- 
employment rate for the rotation groups in the sample for the first 
time and the average for the five subsequent times (during a period 
when each rotation group was in the sample six successive months in- 
stead of following the rotation pattern described above). 

Because the different rotation groups enter into the regular ratio 
estimate with different weights than they enter into the composite 
estimate, such differences for rotation groups mean that the expected 
values of the regular ratio and the composite estimates differ slightly. 
The average unemployment based on the composite estimate for the 
period 1948-1952 was 2.04 million compared to 2.09 million from the 
regular ratio estimate, a difference of about 2} per cent. The average 
difference is smaller than the sampling error of either estimate for an 
individual month. Nevertheless, if we can ascertain why such a differ- 
ence occurs, it should help in understanding and controlling response 
errors; this difference is the subject of investigation at the present 
time. An explanation has not been found from the initial investigations. 

Variance estimates. As has been indicated, the variance estimates 
made from the 68-area sample were subject to relatively large vari- 
ances. The 230-area sample provides much more reliable estimates of 
sampling variability than were possible with the 68-area sample. Be- 
cause of the unreliability of the variance estimates made directly from 
the 68-area sample, a curve-fitting procedure was used to reduce the 
variance of the estimated variances. Also, the curve-fitting procedure 
saved computing time and facilitated presentation of sampling errors. 
While such methods will reduce the sampling variability of the vari- 
ance estimate, they may yield biased estimates. The improved variance 
estimates from the 230-area sample suggest that some of the variances 
estimated for the 68-area survey were substantial underestimates, some 
others were substantial overestimates. 

Until the acquisition of a high-speed electronic computer, the 
Univac, extensive approximations were introduced into the estimates of 
the variances to avoid computations that would be exceedingly time- 





REDESIGN OF CENSUS SURVEY 711 


consuming with the available equipment. The availability of the 
Univac makes it possible to avoid most of these approximations. Even 
with the electronic computer, however, the work of making variance 
computations would be extremely heavy if variances were computed 
for all items directly. Approximate methods will continue to be used in 
the future, but they will be evaluated by more exact computations than 
have been feasible in the past. 

Quality Control. A system of statistical quality control on field inter- 
viewing was introduced as a part of the new survey design to help in- 
sure results of consistently acceptable quality from the CPS. At the 
present time the quality control program is based on reinterviews of a 
subsample of the work of about 12 per cent of the enumerators each 
month. The principles applied are those of process control as described 
in the statistical quality control literature. The program has been de- 
signed to identify those enumerators whose work is beyond acceptable 
limits of performance so that they may be singled out for retraining or 
other administrative action. The basic philosophy is to concentrate the 
principal supervisory emphasis at those points where it will be most 
effective—and to give less attention to interviewers whose work is under 
satisfactory control. This approach is regarded as effective because of 
the relative stability of the interviewer force. 

Response errors. Table 4 gives a frequency distribution of the average 
gross differences in results between original interviews and reinter- 
views covering six recent months of experience with the quality check. 
Results are presented for both “coverage,” which refers to the enumera- 
tion of the population in the sample, and “content,” which refers to the 
characteristics of those persons. The check results for each interview 
are used as the standard from which percentage differences or gross 
“error rates” are computed. These differences between the original 
interview and the check are defined as “errors” although often they 
represent simply response variation without a necessary implication of 
an error having been made by the original interviewer. 

In addition to furnishing information on individual enumerators, the 
reinterviews taken for the quality control operation provide a continu- 
ing source of information on the average overall quality of the CPS pro- 
gram. Table 5 shows results of the original and the check interviews 
over a 6-month period; it also shows the proportion of persons identi- 
fied in a particular class in the reinterview who were identified in that 
same class in the original interview. Some of the items show relatively 
large differences between the original and check reports. This is 
especially true for persons working part-time, those who have a job but 





712 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 


TABLE 4 


DISTRIBUTION OF INTERVIEWERS’ GROSS “ERROR” 
RATES* CPS CHECK, MAY-OCTOBER, 1954 








Gross Error 
Rate 
(Per Cent) 


Number of 
Inter- 
viewers 


Cumulative 
Per Cent 
Of Inter- 

viewers 


Number of 
Errors 


— 


Cumulative 
Per Cent 
Of Errors 





Coverage: Total 229 224 
178 100 


PPPPTT OTIS T@ 
Prrerrrec.e 
owoovovovreooovo © 


SCOON QAR WNe OC 
— 


_ 


15.0-24. 
25.0 and over 


COnMaanaae | Lt) 


Content: Total 


PETPP PERT? T® 

hb nited: of 2 sh sh nk 

oonovevovweovoowe © 
“=O ZB 


SCC OnN OTR WN eK © 
— 


— 


15.0-24. 
25.0 and over 

















* The check interview is carried out by a supervisor or special interviewer without knowledge of the 
results of the original interview. However, after each check interview is completed the results of the 
original interview are compared with the check results, and differences are discussed with the respondent 
in an effort to explain discrepancies and to obtain the most accurate responses feasible. Any corrections 
to the original check interview results are recorded. After this reconciliation, the original check interview 
results tend to be sustained for about two-thirds of the original differences and the original interviewers 
results are sustained for about one-third of the original differences. The results presented as “errors” in 
this table are based on the differences after such reconciliation. 





. REDESIGN OF CENSUS SURVEY 


TABLE 5 


MONTHLY AVERAGE RESULTS FROM ORIGINAL AND 
CHECK INTERVIEWS* FOR PERSONS INCLUDED 
IN REINTERVIEW SAMPLE, 
MAY-OCTOBER 1954 











Monthly Average 
Number Of Persons 


- Per Cent 
Employment Status senenaherel Identically 


Check Reportedt 





Original 
interview interview 





Labor force 1,221 1,242 
Employed 1,149 1,168 
Agriculture 158 164 
Nonagriculture 991 1,004 
Full time (worked 35 hours or 

666 
Part time (worked less than 35 
253 
With a job but not at work... 72 
Unemployed 72 
Not in labor force 945 














* Reinterview averages are based on results after reconciliation. See footnote *, Tabie 4. 
t Per cent of persons reported in specified class in reinterview who were reported in that identical 
class in the original interview. 


are not at work, and those unemployed, and thus the differences tend 
to be concentrated among those groups with marginal attachments to 
the labor force. Table 6 permits comparisons of average results for 
segments in the sample for the first time, and for continuing segments. 
Only a few of the differences between original and check results in 
Tables 5 and 6 are statistically significant. 

Problems in measuring response errors and in controlling the quality 
of field interviewing vary directly with the difficulty in separating and 
measuring errors arising from limitations of the data-collection instru- 
ment (questionnaire, definitions, etc.) itself, errors from the respondent, 
and errors arising from interviewer performance. In measuring char- 
acteristics, as distinguished from coverage, the response variability is 
sometimes large even when the interviewer carries through the ques- 
tioning procedure exactly as specified. For such characteristics the 
minimum aim will be to attain a small net error (bias), but there is no 
assurance that efforts which are successful in maintaining a small net 





714 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 © 


error will result also in a small gross error. Thus, for some of the mor 
difficult labor force and related measurements, the contribution of inter. 
viewer error to the grogs error rate may be relatively small compared 
with those that are a consequence of the concepts and definitions en. 
ployed and errors made by respondents. In such circumstances modif. 
cation of the concepts and definitions may be needed. 

Problems in shifting from old to new sample. Funds which could be 
used for shifting from the 68 to the 230-area sample were not available 
until August 1953, when the decision was made to introduce the new 
sample during the latter part of 1953 and early in 1954. The survey 
work was initiated earlier in a few areas to allow opportunity for pre. 
liminary testing and modification of procedures. Then, in November 
1953 the survey work was started in one-third of the new areas; two- 
thirds were in operation in December; and the entire sample was in 
operation in January 1954 on a “dry-run” basis. The results of the old 
and new surveys for January were reasonably consistent for most 
items, but important differences were observed for some. For unem- 
ployment, in particular, the new estimate exceeded the old by about 
30 per cent, a relatively large difference and greater than could have 
been explained by sampling variability. However, from some points of 
view the difference may not be large, even though statistically sig- 
nificant, for an item whose measurement often involves the evaluation 
of attitudes; one survey gave an estimate of a little less than 4 per cent 
of the labor force as unemployed, and the other approximately 5 per 
cent. 

The use of a probability sample made it possible to estimate the dif- 
ferences that might reasonably occur from the change in samples. Be- 
cause the observed difference for unemployment was larger than 
could be explained as sampling error, it was clear either that the 
sampling operations had not been carried out in accordance with the 
specifications, or that differences in measurement or response must 
account for an important part of-this difference between the two survey 
results. Failure to find important departures from the sampling speci- 
fications led to the inference that an important part of the difference 
results from other factors. 





* An illustration of a problem of this type may be found in the U. S. Census of Agriculture. Prior 
to the 1950 Census of Agriculture, the definition of a “farm” for purposes of the census enumeration de- 
pended for places of less than 3 acres upon the value of products sold or consumed by the operator's 
household. The enumeration of the value of products in those cases was found to be subject to relatively 
large response errors. Some improvement in this case resulted from dropping dependence on value of 
products from the definition of a farm for the census enumeration and substituting a list of specific char- 
acteristics. 





REDESIGN OF CENSUS SURVEY 715 


TABLE 6 


MONTHLY AVERAGE RESULTS FROM ORIGINAL AND CHECK 
INTERVIEWS* FOR PERSONS INCLUDED IN REINTERVIEW 
SAMPLE, BY NEW AND CONTINUING SEGMENTS, 
MAY-OCTOBER 1954 








Continuing New 
Segments Segments 





Employment Status 
Original | Check | Original | Check 
interview | interview | interview | interview 





Total persons 14 years old and over | 1,667 1,667 499 499 


942 957 279 285 
889 901 260 267 
128 131 30 33 
Nonagriculture 761 770 230 234 
Full time (worked 35 hours 

or more) 509 504 159 

Part time (worked less than 

35 hours) 200 210 56 

With a job but not at work... 52 56 19 
Unemployed 53 56 18 
Dob It TOOT TONGS... 2... occ sews 725 710 


| 

















* Check averages are based on results after reconciliation. See footnote *, Table 4. 


One might expect that new interviewers recently recruited and 
trained in the measurement of the rather difficult concepts involved 
might be the principal source of difference, and that as these new inter- 
viewers acquired more experience the results from the new survey 
would approximate those of the old. Undoubtedly this was a factor, 
but it appeared not to be the principal one. On the contrary, evidence 
indicated that the old sample, more than the new, was yielding results 
that were out of line. For example, the new sample was providing 
higher estimates of some of the marginal classes of the unemployed and 
of other labor force categories, and past experience had indicated that 
more careful work was needed to identify more of these marginal cases. 
Also, during the last few months of 1953 and especially in January 
1954, the number of persons receiving unemployment insurance bene- 
fits had become an increasingly higher fraction of total unemployment 
as estimated from the 68-area survey, and a considerably higher frac- 
iion than had been observed at most dates in earlier years. 

One factor that might explain the relationships appeared to be par- 





716 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ticularly important. During the period of introducing the new sample 
in the field, beginning in September 1953, the supervisors had been 
instructed to concentrate their attention on recruiting and training the 
staff for the new survey, and on getting the new sample into operation, 
The experienced interviewers were to carry on for several months with 
training and review of their work by mail. This was a sharp change in 
procedure—this staff had been accustomed to continued personal at- 
tention. At the same time, most of the old interviewers probably were 
hearing rumors that their jobs would be terminated shortly, even be- 
fore they were officially notified. 

This type of reasoning suggested that the performance of the ex- 
perienced interviewers might have deteriorated, and that it would be 
desirable to reinforce the supervision and training activities in the old 
sample as well as the new before the February survey. Accordingly, an 
additional special training session was arranged in February for the 
interviewers in both the 68-area and 230-area designs. 

The results in February came considerably closer together— 
sufficiently close that the differences might be explained as entirely 
due to sampling variability. There was, however, some supplementary 
evidence that other factors might still be present, although in a con- 
siderably lesser degree than in January. It appeared that because of the 
retraining, or perhaps as a result of the widespread publicity regarding 
the difference in the unemployment estimates and the effect this pub- 
licity might have had on the interviewers, or perhaps because of both 
of these and other factors, the differences were largely removed in the 
February survey. The 68-area sample was discontinued in March. 

Table 7 shows the estimates of the major labor force characteristics 
for January and February from both samples, together with the 
sampling errors of the differences between the estimates. 


FUTURE DEVELOPMENTS 


When the new survey results were published in January, the Secre- 
tary of Commerce appointed a Committee’ to review the reasons for 
the differences in the new and old survey results. This Committee and 
the Census staff both have worked extensively in exploring the results 
obtained and have outlined areas for further work. A continuing re- 
search program has been in progress on a small scale, and funds are 
being requested for expanding this research. Topics for particular at- 
tention include further work on response errors in labor force measure- 





? Consisting of Frederick F. Stephan, Chairman, Lester R. Frankel, and Lazare Teper. 





REDESIGN OF CENSUS SURVEY 717 


TABLE 7 


PERSONS IN CIVILIAN LABOR FORCE BY EMPLOYMENT STATUS, 

AS ESTIMATED FROM 230-AREA AND 68-AREA SAMPLES, WITH 

SAMPLING ERRORS OF DIFFERENCES, JANUARY AND FEBRUARY 
1954 








Estimated 
Standard 
Error Of 
Employment Status Differ- 
ence 
(s) 
(1000's) 





January 
Civilian labor force 62,840 | 62,137 591 
Nonagricultural em- 
ployment 54,469 | 54,433 756 
Agricultural employ- 
5,284 | 5,345 353 
3,087 | 2,359 188 


February* 
Civilian labor force 64,079 | 53,491 603 
Nonagricultural em- 
ployment 54,535 | 54,480 55 757 
Agricultural employ- 
5,761 | 5,626 135 376 0.36 
3,780 | 3,385 395 233 1.70 




















* The February estimates for the 230-area sample given here differ from the published estimates 
which are based on a composite estimating method. The estimates given here are based on the regular 
ratio estimate using the February sample and are the appropriate estimates for this comparison. 


ment, questionnaire design, interviewing, the effectiveness of alterna- 
tive training procedures, and quality control methods. Attention is 
being given to identifying the reasons for the small but persistent dif- 
ference in unemployment measurements for households in the sample 
for the first time, and the measurements obtained from interviews in 
the same households in subsequent months. 

In an attempt to realize the objective of maximum information per 
dollar spent, a particularly important class of problems arises out of 
questions as to the effects of training, observing, checking and other 
“non-productive” work on the quality of data obtained. Objective 
evaluation of these effects is urgently needed to replace the present 
intuitive guides which may often be misleading. We hope to arrive at 





718 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 


this evaluation by experiment and test. The existence of such data may 
have far-reaching effects on the organization and administration of 
field surveys. 

There is a need, also, for careful study of accuracy requirements in 
the results for various purposes. This problem is exceedingly difficult 
and long neglected; it needs extensive attention of the Census staff 
and of the users of the labor force statistics. 


APPENDIX——WITHIN-P.S.U. VARIANCE OF THE COMPOSITE ESTIMATE 
[Prepared by Max Bershad and Margaret Gurney] 
A. Monthly Level 
From equation (4) 
2”, = > Ky’. (4a) 


t—0 

Each y’, can be expressed as the expected value plus a random deviation re- 
sulting from the selection of p.s.u.’s and a random deviation resulting from the 
selection made within the selected p.s.u.’s. For a fixed selection of p.s.u.’s, the con- 
ditional expected value, over all possible within samples, of the product of the 
within deviation and the between deviation is zero; and consequently the ex- 
pected value of the product is zero when taken over ali possible selections of 
p.s.u.’s. This is also true when the within deviation for one month is multiplied 
by the between deviation of another month. The two types of deviations are un- 
correlated and the total variance is expressible as the sum of the between-p.s.u. 
variance and the within-p.s.u. variance. 

The sample of p.s.u.’s is unchanged from month to month and the between-p.s.u. 
variance of the composite estimate is unchanged from that of the regular ratio 
estimate. 

The within-p.s.u. variance of the composite estimate is derived below, where it 
will be understood that the symbols for variance and covariance refer only to the 
within-p.s.u. component. 

Consequently from (4a) 

en, = pe [Koy + 2K oye yy iii + 2K oy yy ine (5) 
i=? 


+ 2K oy, iyyiis bee | 


orgy = | Ky; + 2 ) a z Khoyy_iv'ys-j 


t=O t—0 j=l 


If 
Pyyig = Oy’, 
for all ¢ and if, 
Fy’ u—iv'u—i-g ™ Fy’ uy’ uj 


for all 4 and j, then (5a) becomes, for large u, 





REDESIGN OF CENSUS SURVEY 


ayy 2 > Ki 
.” . = K2 + _ K? = Cy’ uy uj 





a7" 


B. Monthly Difference 


Since 
so”. = Y's + Kr" us 


and 
So4 @ y’u-t + Kz''u-» 


then 
(2"" = 3) = (y's y' u-1) + K(s" 0.1 aed zy) 


Letting 


” ” 


ee ery 


and 


w'y se y's = Yur 


we have from (7) 
"4 = wae t+ K (2""u-1) (8) 
which is of the same form as the composite estimate for monthly level. Thus the 


variance of the month-to-month difference is given by (6), with y’, in (6) being 
replaced by y’y —y‘u-1; and 
ory, 2 


Qe = Ki w’ ye’ yj. 
<< sie 








SAMPLING METHODS IN THE YUGOSLAV 1953 CENSUS 
OF POPULATION* 


S. 8. Zarxovié 
Yugoslavia Federal Statistical Office 


PROGRAM 


URING the preparations for the 1953 census of population in 
Yugoslavia two groups of problems appeared, the solution of 
which required the application of sampling methods. In the first group 
were problems common to ail population censuses; the second contained 
a question of special interest to less developed countries. 

One of the problems in the first group related to the time required 
for processing the data. The censuses of 1921, 1931, and 1948 were 
tabulated by hand. The processing of the 1921 census lasted 10 years. 
The 1931 census was still incomplete at the beginning of the war in 
1941, when all of the material was lost. The processing of the March 
1948 census lasted to the end of 1949 in spite of its narrow scope and 
the great number of people engaged in it. 

Results so delayed lose much of their practical value. For the 1953 
census it was essential that estimates of the most important facts be 
prepared as quickly as possible, since social and economic measures de- 
pended on census findings. The first problem therefore, was to prepare 
a set of preliminary estimates that could be used until better and more 
complete figures became available. 

As will be shown later, the application of sampling methods in this 
census was such that the preparation of advance estimates could be 
divided into two parts, (a) estimates prepared very quickly for a small 
number of characteristics, and (b) estimates prepared over a somewhat 
longer period for a larger number of characteristics. 

Another of the common group of problems is the completeness of 
enumeration of the population. Investigation of this point was impor- 
tant for several reasons. First of all, a measure was required of the 
accuracy of the population totals obtained by the enumerators for 
1953. Then, some statistical questions relating to the next census also 
called for putting this check on the program. Checking completeness 
shows in what cases enumeration meets difficulties and what problems, 
in the given system of enumerstion, should be considered more fully 





* The author is very much indebted to Morris H. Hansen, U. 8. Bureau of the Census, who read 
the manuscript and made a great number of euggestions for improving it. 


720 





sAMPLING METHODS IN YUGOSLAV CENSUS 721 


in future work. It gives information on the capabilities of the available 
enumerators, shows the advantages and deficiencies of the definitions 
used, of the accepted system of enumeration, etc. 

The next problem is the measurement of the volume of errors in 
answers On census questions. This is a modern theme and a very im- 
portant one for the proper formulation of questions, for the organiza- 
tion of the census, and for the users of statistical data as well. The 
general importance of the problem of errors in answers is clear enough, 
but such research has a special value in countries having a relatively 
low level of cultural development, with a large percentage of illiterate 
people, who are not familiar with the concepts used in the census. 
Measuring the volume of errors under such conditions makes it possible 
to see what can be done by the census and what reliance can be placed 
on the data obtained. The usefulness of such research is much greater 
if it is designed in such a way as to give the percentage of wrong answers 
on a question and information on the sources of errors as well. This 
information is valuable in contemplating measures for reducing errors 
or for improving answers. 

The last problem in this group is the quality of the editing. Al- 
though a great many errors appearing on census questionnaires are 
caught in the process of editing, by no means all are eliminated. And 
it is known that many of those remaining on the questionnaires 
could in principle have been corrected. It was important, therefore, 
to determine to what extent errors of different types were corrected 
in editing in this census, and, more generally, to what degree the editing 
approached the ideal in quality of work. Without answers to these 
questions it is not possible to gain full insight into the reliability of the 
census results or to prepare adequate editing procedures for the next 
census. . 

In the second group of problems—those of special concern to less 
developed countries—we undertook a study of the accuracy and value 
of literacy data. As explained in a previous article published in this 
Journal [12], the first impetus for doing so came from the strange course 
of the figures on percentage of illiterates as shown in the previous 
censuses. These results were as follows: 


Census Percentage of Illiterates 


1921 50.5 
1931 44.6 
1948 25.4 
1953 24,9! 





1 This figure represents the sample estimate. 





722 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


The sharp drop in illiteracy from 1931 to 1948 (averaging over one 
per cent a year) and the negligible decrease from 1948 to 1953 seem 
clearly open to question, since the former period included the war years, 
when most of the schools were closed, while the latter period was char- 
acterized by an extensive expansion in the country’s educational system 
and by special efforts to eliminate illiteracy. 

In addition to this uncertainty as to the reliability of the census 
figures, there was need for further light on the meaning of the replies 
given regarding literacy. Within the group reporting themselves as — 
able to read and write, what degrees of proficiency were represented? 
How many were fully literate, how many moderately so, and how 
many were on the borderline of complete illiteracy? 

In summary, then, the application of sampling methods in this 
census covered the following program: 

(i) first advance estimates, 
(ii) second advance estimates, 
(iii) checking the completeness of enumeration, 
(iv) investigation of the volume of errors in answers, 
(v) checking the value of literacy data, and 

(vi) checking the quality of editing. 

First, we shall give some data on the samples used in carrying 
through this program and then we shall indicate briefly what was 
done and what results were obtained with each particular problem. 


THE GENERAL SAMPLE 


For all phases of the program, enumeration districts were used as 
the primary sampling units. For parts (ii) and (vi) special samples 
were drawn, but for the other four parts of the work the same primary 
sample was employed. Information regarding this generai sample will 
therefore be given here. The samples for parts (ii) and (vi) will be de- 
scribed later. 

Ten days before the beginning of the census the total number of 
enumeration districts in each administrative unit (commune, county, 
republic) was known centrally. On the basis of the administrative 
division all the enumeration districts were stratified into two strata: 
rural and urban. From the urban stratum 100 enumeration districts 
were selected and from the rural stratum 149. The selection of enumera- 
tion districts as primary units was done with equal probabilities but 
with a smaller sampling fraction in the rural stratum than in the urban. 

In determining the size of the sample of primary units two consider- 
ations were decisive. The first was the available number of inspectors. 
The study program was organized in such a way that only one in- 





SAMPLING METHODS IN YUGOSLAV CENSUS 723 


spector was assigned to each enumeration district and there he had to 
do a rather complicated job. Therefore the inspectors’ training needed 
to be intensive. But due to conditions under which the surveys were 
prepared the number of qualified inspectors was limited and so the 
sample of primary units had to be less than 300. The second considera- 
tion related to point (iii) of the program. It was desired that the meas- 
urement of the completeness of enumeration should give an estimate 
of the total population with a coefficient of variation that would not 
exceed 0.30 per cent. On the basis of advance speculations we expected 
the coefficient of correlation between enumeration districts of the 
number of residents as obtained by the census and the same number as 
obtained by the check to be as high as 0.995. It was possible, therefore, 
to satisfy the above requirement with a sample of 250 enumeration 
districts. 

The sample of primary units was used without subsampling to get 
answers on points (i) and (iii) of the program. For points (iv) and (v) 
it was necessary to subsample individuals from the selected enumera- 
tion districts. For each of these parts of the program the second stage 
sample was different, since the populations of individuals covered in the 
two studies were not identical. Further information on these samples 
will be given in subsequent sections. 

Table 1 contains information on resident population and on the 
number of enumeration districts in the population and in the sample, 
classified by strata. More details on this sample are given in [1]. 


TABLE 1 
BASIC DATA ON POPULATION AND SAMPLE 








Urban Rural 
Stratum Stratum 





Resident population (in thousands) 4,832 12,095 
Total number of enumeration districts 29 ,805 89,194 
Number of enumeration districts in the sample 149 
Per cent of enumeration districts in sample P 15 
Per cent of persons in sample ‘ 16 





All of the data in this table are based on the preliminary count. 


FIRST ADVANCE ESTIMATES 


The inspectors employed in this program came to their enumeration 
districts eight to ten days after the census day. At this time enumera- 
tion was over and for each person in the country a filled questionnaire 
existed (theoretically). To get the first advance estimates it was initially 





724 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


planned that the inspectors should take with them all questionnaires 
for the persons enumerated in their districts. The processing of these 
questionnaires would make possible the desired estimates. This system 
would give results without any duplication in processing because the 
punch cards for the sample districts can be put later in their proper 
place among the others. 

The urgent need for data changed this plan in the following way: 
answers of all individuals enumerated in a sample district were sum- 
marized on a sheet by the inspector assigned to that district, and the 
totals so obtained were used to get the estimates. The census day was 
March 31 and these first estimates were ready at the end of April. So 
the basic results of the census were known immediately after the 
enumeration was completed. The following groups of tables were in- 
cluded in these estimates: (i) demographic data, (ii) nationality, (iii) 
literacy, (iv) degree of education, (v) religion, and (vi) economically 
active population. 

From the point of view of variances the system of estimation used 
was not efficient as compared with a simple random sample of indi- 
viduals, since some characteristics, such as branches of industry, oc- 
cupational groups, and in general all economic characteristics, have a 
relatively high intraclass correlation within enumeration districts. For 
comparison we give the following data. The relative standard errors 
of the proportions of literate persons, as obtained by using the ratio 
method of estimation with enumeration districts as sampling units, 
were 2.5 per cent in the rural stratum and 1.9 per cent in the urban 
stratum. If the equal number of individuals were selected at random, 
the variances of the corresponding simple unbiased estimates would 
be 0.32 and 0.27 per cent respectively. The difference in efficiency is 
considerable, but since obtaining these estimates was inexpensive the 
system will probably be retained in the future, the only change being 
that the number of primary units will be increased. It is easy to use 
the larger sample of primary units for this purpose because the com- 
piling of the necessary data is simple and does not require the presence 
of a trained inspector. 

In similar cases we are usually interested in subsampling enumeration 
districts. In this work the subsampling was not used since it was found 
it would not decrease the costs. 


SECOND ADVANCE ESTIMATES 


The sample of 249 enumeration districts was not sufficient to yield 
satisfactory estimates in those cases in which a cross classification was 





SAMPLING METHODS IN YUGOSLAV CENSUS 725 


necessary. It also was inadequate for preparing separate estimates for 
the six republics. Thus there appeared to be need for another sample, 
larger in size. This second sample was desirable also because of the fact 
that the individual data used in preparing the first estimates were not 
previously submitted to editing, and because the regular processing of 
the census was expected to be finished toward the end of 1955. 

The second sample, designed to meet the need for more detailed 
figures, also consisted of enumeration districts as primary sampling 
units; these were selected with a different sampling fraction for each 
republic, the fractions varying between 0.14 and 0.67. In the second 
stage of selection 10 per cent of households were taken. The use of 
different sampling fractions in the first stage selection resulted from the 
desire to have the sampling errors on pretty much the same level in 
each republic. 

The relative efficiency of the household as the second stage sampling 
unit in comparison to the simple random sample of individuals is also 
low for some characteristics. An experiment was carried out in one re- 
public to compare the relative efficiencies of different sampling units 
and different methods of estimation. The results of this experiment are 
summarized in Table 2 for some selected items, with the variance of a 
simple random sample of persons taken as 100. The stratification was 
done according to size of household. 

On the basis of such data and after having taken respective costs 
into consideration, the household was selected as the more economical 
secondary unit. 

The procedures used in connection with this sample, the results ob- 
tained, and the discussion of problems faced are presented in references 
[2, 4, 6, and 8]. The latter two publications also contain some tables 
showing the magnitude of standard errors broken down by items, 
republics and provinces. 


COMPLETENESS OF ENUMERATION 


For evaluating the completeness of census enumeration the general 
sample of 249 primary units was used, without subsampling, according 
to the following plan. 

After the enumeration was completed, inspectors were sent to the 
selected enumeration districts provided with maps on which the borders 
of the districts were drawn. Their duty was to visit all dwelling units 
and make a new count of all persons, entering only their names and 
classifying them according to three groups: (i) permanently present, 
(ii) temporarily absent, and (iii) temporarily present. The first two 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


TABLE 2 


RELATIVE EFFICIENCIES OF DIFFERENT SAMPLING UNITS 
AND DIFFERENT METHODS OF ESTIMATION* 








Sampling Unit 





Person Household 





Simple 
random 
sample 


Simple 
random 
sample 


Stratified 
sample 


Stratified 
sample 





Males 100 100 70 147 
Illiterates 100 100 71 71 
Persons by year of birth 
1940-1948 100 108 59 87 
1900-1909 100 104 90 100 
1879 100 | 110 91 91 
Employed persons 
Total 100 102 52 52 
Workers 100 100 66 73 
Employees 100 100 80 81 
Farmers 100 100 30 42 
Others 100 100 92 92 




















* The figures in this table are published in somewhat different form in Blejec [4]. 


groups constitute the resident population. In this new count the in- 
spectors did not know the results obtained in the census. Their task 
was to reproduce the situation on the census day. 

After their count was finished the local population census commis- 
sion (a group of local inhabitants charged with supervising the census 
operations in the field) matched the results of the checking against the 
results of the census. Lists of the names made in checking by house- 
holds were used to facilitate this matching. If differences appeared in 
any case they were subjected to a new check by the commission. The 
results obtained in this new check were final. Thus data obtained by 
inspectors were not considered as definitely correct. This practice was 
justified later by the errors found in inspectors work. In some cases they 
omitted to count a person and in still more cases their classification of 
people under the three above groups was not correct. 

This matching of the two sets of results serves also to eliminate errors 
that might appear because of the fact that the inspectors necessarily 
make their check of the completeness of enumeration at least a week 





SAMPLING METHODS IN YUGOSLAV CENSUS 727 


after the census day. This time interval between the two surveys could 
be reduced by the simultaneous work of both enumerators and inspec- 
tors, but this would not give the desired result, since the enumerator 
would be informed about the check on his district and the whole pur- 
pose of checking would be lost. Thus a difference of several days is 
necessary. But if inspectors, because of population changes, were not 
always successful in reproducing the situation on the census day, the 
above matching and the following check should reduce their errors. 

On the basis of the results of this survey an indication of the quality 
of enumerators’ work can be arrived at in the following way. All per- 
sons in the selected enumeration districts were classified in one of the 
following classes: 

(i) persons not classified by enumerator (sign —), 

(ii) persons given the same classification by both enumerator and 
inspector (sign =), 

(iii) persons enumerated in the two surveys but differently classified 
(sign ¥), 

(iv) persons classified by enumerator but not by inspector (sign +). 

The percentages of cases falling in these classes are presented in 
Table 3. They show the different aspects of the quality of enumerators’ 
work. 

As regards the number of persons in the resident population, the 
figures in Table 4 are a rough estimaie. 

It will be seen that while a net under-enumeration was found in the 
urban stratum the rural stratum and the total population showed a 
net overenumeration. This result is somewhat unusual. Hansen, 
Hurwitz, and Pritzker, [7] have shown an over-all net underenumera- 
tion in a similar check carried out in connection with the 1950 U. S. 


TABLE 3 
PERCENTAGES OF CASES FALLING IN THE FOUR CLASSES 








Percentage 





Rural Stratum Urban Stratum 





0.23 0.83 
0.26 0.45 
0.58 1.38 
98.93 97.34 














728 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


TABLE 4 


ESTIMATES OF THE COMPLETENESS OF ENUMERATION OF 
THE RESIDENT POPULATION 








Stratum Total 





Rural Urban Number Percentage 





Under enumeration 27,500 21,500 49 ,000 
Over enumeration 40 ,000 11,500 51,500 
Gross error 67 ,500 33 ,000 100 ,500 
Net error +12,500 —10,000 +2 ,500 

















census of population. This seems to be a logical expectation in statis- 
tical surveys. But our result is explained by the fact that many people, 
belonging to the rural stratum, have a job outside the place where their 
families live. This is probably because of the housing shortage. Some of 
these people come home once a week, and others only from time to 
time. Census instructions concerning such persons were complicated 
and the result was double enumeration, at home and in the place of 
work. Similar difficulties were also met in the case of students absent 
from their parent’s homes to attend school. 

Some comment should be added regarding the degree of complete- 
ness of enumeration. Judging by the results, the enumeration was 
excellent from the point of view of completeness. It goes without say- 
ing that this could arise either from a very good initial enumeration, or 
from an inadequate check, whatever the quality of the original enumer- 
ation. The full answer on what really happened here is almost beyond 
practical possibilities because it would require further superimposed 
checks. The only additional evidence in favor of the general character 
of our results is a recheck made on 21 enumeration districts; the find- 
ings were the same except for one person. Other and more subjective 
evidence appears in the reports on this point from the general field 
inspectors of census operations; they also talk about the good work of 
the regular enumerators. 

For the improvement of census methods and techniques in this sort 
of research, great importance attaches to the analysis of cases in which 
the census and the check disagree. It was found here that disagreement 
rarely appears because the enumerators failed to classify someone. 
Most of the differences relate to difficulties met in applying the defini- 
tions to individual cases. For illustration we might mention the ex- 
amples of two enumeration districts. In the first district a tubercular 





an eon oe 2 62 a Pe 


SAMPLING METHODS IN YUGOSLAV CENSUS 729 


sanatorium was located, and in the second a building site for a new 
factory. Each of these districts had frequent population changes; some 
people would remain for a month, others for several months and even 
years. In these circumstances it is almost impossible for an enumerator 
to carry through the census definitions. Furthermore the definitions 
themselves were not always formulated in such a way as to fit the 
cases met in these two enumeration districts. Details concerning this 
check are given in [10]. 


ERRORS IN ANSWERS 


Errors in answers on census questions may appear in connection 
with any enumerated person regardless of age or other personal char- 
acteristic and also regardless of whether the respondents filled out the 
questionnaire themselves (as was provided in instructions) or whether 
it was prepared by the enumerator (for illiterates, children, etc.) or by 
some third person (as actually happened in many cases). Accordingly, 
for the second stage sample used in this part of the research program, 
all individuals enumerated in the 249 selected primary units had to be 
taken into account. In this case the sample of secondary units was 
drawn by inspectors on the field. As a frame they used the filled ques- 
tionnaires and applied the sampling fraction of 1:8 in the rural stratum 
and 1:10 in the urban stratum. Thus in the rural stratum, a total of 
2,470 individuals was selected and 1,684 persons in the urban stratum. 
Different sampling fractions in the two strata were used in order to get 
an approximately equal distribution of work among the inspectors. 

The fieldwork was carried out as follows. Without being informed of 
the answers given previously in the census, inspectors got in touch with 
each person in the sample and took again the answers on the more im- 
portant census questions. In doing so they used some control questions 
and asked for documents to prove the accuracy of answers obtained 
(provided documents were available and the respondents were willing 
to show them). Their results were then matched against the census 
results. In this matching, identical answers were defined as correct 
answers; for most circumstances this may be considered as a sound sup- 
position, but it does not necessarily imply an absolutely correct answer. 
For example, if someone, for one reason or another, gives a wrong an- 
swer on the question “Occupation” and later declares the same to the 
inspector, his answer in this check would be considered as correct in 
spite of the fact it is actually wrong. Here again the problem appears 
as to how far one can go in using different means of checking. For rea- 
sons of simplicity and economy our procedure was as described, intro- 





730 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ducing the corresponding limitation in the value of the results obtained. 

As opposed to this, cases in which differences were found were sub- 
jected to a new check, made this time by more rigorous methods. In- 
spectors themselves and the members of the local population com- 
missions were charged in such cases to look for documents if available 
at accessible places, to talk to other persons able to provide informa- 
tion about the individual in question, or to get again in touch with that 
individual, throwing more light on the problem by using additional 
questions. The resulting answer was accepted as correct. 

For the final analysis, the following definition of error was used: each 
answer in the census is wrong if it is not identical with the correspond- 
ing answer in the check. In addition to this class of errors, formal errors 
were also taken into account. An example of these errors is the following: 
If a person on the question “Degree of education” entered a dash, this 
was counted as an error, since the instructions explicitly required that 
answers be written in full. Thus the concept “Volume of errors” 
adopted in the analysis contained both classes of errors. 

Considering first the questionnaire as a whole, and defining as correct 
those showing no differences with the check, and no formal errors on 
any questions, we find the results presented in Table 5, where the six 
republics are ranked according to the percentage of correct question- 
naires. It is interesting to note that these ranks correspond to the rank- 
ing of the republics according to the level of their cultural develop- 


TABLE 5 


RANKS OF REPUBLICS ACCORDING TO PERCENTAGE OF 
CORRECT QUESTIONNAIRES 








Republic Percentage 





Slovenia 

Croatia 

Serbia 

Montenegro 

Bosnia and Herzegovina 
Macedonia 





ment as measured by the percentage of literate people and the degree 
of education. For the professional statistician this is important in- 
formation. 

If we classify the questionnaires by the number of errors in them 
we get the percentages shown in column 1 of Table 6. Analyzing the 





SAMPLING METHODS IN YUGOSLAV CENSUS 


TABLE 6 


PERCENTAGES OF QUESTIONNAIRES WITH A GIVEN 
NUMBER OF ERRORS 








Percentage of Questionnaires 





Number of Errors 
Without editing After editing 





no om 


cocorN NN = DD 








NAQNPRwonNReo 
S2onoynneh 
WHDARWOS 





results by sex, it was found that the percentage of wrong answers for 
males is somewhat lower than the percentage for females. 

With regard to the percentage of wrong answers on particular census 
questions, the results obtained for the rural and the urban strata are 
presented in columns 1 and 2 of Table 7. These percentages are cal- 
culated with respect to the total number of persons in the sample. This 


is why some of them are low. This is the case particularly with ques- 
tions which only a limited number of people answer (e.g., the number 
of live births). It will be noted that there is a high rank correlation 
between the percentages of wrong answers in the two strata. 

As a further step in the study, the possibility of correcting errors in 
answers by editing was considered. The answers as finally determined 
by the check were known and hence it was possible to estimate what 
would happen to errors as a result of maximum editing. For this pur- 
pose three classes of errors need be distinguished. In the first class are 
errors that can be corrected in editing and that do not appear in 
tabulation. An example of such an error occurs in determining age from 
the date of birth. If the age is tabulated by years, those errors in dates 
of birth that do not change the age in full years disappear as errors 
during tabulation. The second class comprises errors that can be 
identified as inconsistencies but can be corrected only if some additional 
information from the field were obtained for the particular person. 
In the third class are those errors that can not be eliminated because 
there is no basis for doubt regarding the reliability of the answers 
given. We have such a case if someone falsely reports his date of birth 





732 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


TABLE 7 
PERCENTAGES OF WRONG ANSWERS ON CENSUS QUESTIONS 








Percentage of Wrong Answers 





Quection Without editing After editing 





Rural Urban Rural Urban 
stratum stratum stratum stratum 





1 2 3 


~ 





Date of birth 32. 
Place of birth 2 
Marital status 2 
Number of times married 2. 
3 
1 


— — 
el li | 
ornow ow 


— 
= Oe ee 8 


Age at first marriage 1 

Number of live births 

Number of children on the cen- 
sus day 

Legal nationality 

Literacy 

Degree of education 

Occupation 

Occupational status 

Industry 

Economic sector 


OPDwans 
= Om eS De 
Ame OA6o 
wonprwra 


a 
QaQntewadodo 
eonwonwore 

_— 
QomourP OOO 
Cawwvcorts= 
Qanrtawonr OO 
owows! Ow 
rFaIWwWoaoornc sc 
aArTISHETH hon 

















by several years or enters the wrong occupation or the wrong degree 
of education. 

In order to estimate the maximum effect of editing or the minimum 
percentage of errors that would necessarily remain in the tabulated 
census figures, we assume that the editing will be carried through 
ideally, i.e., all errors of the first and the second class will be corrected. 
This assumption appears justified because of the fact that the statis- 
tical service in Yugoslavia has a very wide-spread field staff which 
was used in this census to get the additional information needed regard- 
ing some persons for correcting errors of the second class. The fre- 
quencies of errors in questionnaires after editing are shown in column 
2 of Table 6; the percentages of errors that remained in answers to 
various census questions after editing are presented in columns 3 and 4 
of Table 7. As one sees, editing may considerably reduce the number of 
errors but its influence is not equal for all questions. 

The results of further analysis for some particular census questions 
might be noted. 





SAMPLING METHODS IN YUGOSLAV CENSUS 733 


In connection with answers on the question “Date of birth” the 
following was found: 

(i) the percentage of wrong answers is larger in the rural stratum, 

(ii) more correct answers are given by males than by females, 

(iii) the rank of the republics according to the percentage of wrong 
answers on this question is the same as their rank according to the 
degree of cultural development; 

(iv) no systematic tendency to increase or to lower the age was 
found among the wrong answers; 

(v) the wrong dates of birth show the characteristic tendency of 
concentration about certain dates, primarily about those ending in 0 
or 5. This holds for the two strata; 

(vi) there are some dates with relatively low frequencies, such as 2, 
3, 22, 23, 24, 27, 29, 30 and 31; 

(vii) the frequency of births, according to wrong answers, is con- 
siderably larger in the first 15 days of months than in the remainder 
although in the latter there are more days (because of the 31st in some 
months). 

With regard to the question “Age at first marriage” it was found 
that males had no tendency to change the real age in either direction. 
For females the results show a definite tendency to overstate this age 
(probably because some girls marry very young). 

Understatement was found in wrong answers on the question “Num- 
ber of live births.” 

For literacy no tendency was discovered either to over- or to under- 
state the real status. This contradicts to some extent the widespread 
opinion that a large proportion of illiterate persons report themselves 
as literate. 

With regard to the degree of education the same number of over- 
and understatements was found. The results also show that in the 
majority of cases the wrong answers change the degree of education 
for only one school year. In examining the sources of these errors an 
interesting result was found, i.e., the same tendency to round off the 
degree of education as was found in reporting date of birth. This 
rounding off is made on some characteristic educational level such as 
the completion of elementary school, high school, or a full college 
education. 

The analysis of wrong answers on the “Occupation” and “Industry” 
questions shows that a large proportion of the errors have their source 
in difficulties in determining the occupation of housewives, particularly 
in the rural stratum. This information will be very useful in preparing 
instructions and adequate definitions for the next census. 





734 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


The problem of errors in this census, their analysis, the organization 
of this survey and some related questions are discussed in [9] and par- 
tially in [3). 


THE VALUE OF LITERACY DATA 


Not all persons in the general sample of primary units have been 
included in the study of the value of literacy data. Since the census 
questions regarding literacy were confined to persons 10 years of age 
and over, those under 10 years of age were sutomatically excluded. 
Furthermore, persons reported in the census as having completed ele- 
mentary school or more were not considered, as it was assumed that 
they were definitely able to read and write. And on parallel grounds, 
those reported in the census as illiterate were excluded from the 
analysis. From the remaining individuals a secondary sample was 
drawn, covering 417 persons in the urban stratum and 1,022 in the rural 
stratum. 

To investigate the degree of the ability to read and write, a series 
of special tests were prepared, with a system of points for right answers 
or successful achievement that allowed a maximum of 45 points for 
reading and ten points for writing. These tests were given by the in- 
spectors to each person included in the sample. The results obtained 
show that the literacy of the “literate” population is a continuous 
variable, which may have values falling anywhere within the range of 
zero to the maximum number of points possible in the tests. 

This brings out clearly the complexity of the problem of literacy 
data. Individual statements on literacy are merely the reflection of 
personal opinions as to what literacy is. It was found that persons may 
consider themselves literate when their ability to read and write is at 
such a low level that for any practical purpose they would have to be 
placed in the illiterate group. It is obvious that literacy statistics could 
be improved if one were able to give a quantitative definition of the 
degree of literacy required for an affirmative answer, but unfortunately 
this cannot be done in the census. The only practical means of throw- 
ing more light on the meaning of census data on literacy appears to be 
through a supplementary sample survey such as reported here. 

Further information regarding this study will be found in [1, 12,13]. 


QUALITY OF WORK IN EDITING 


The research of editing was undertaken to determine the quality 
of work in this phase of processing, the extent to which errors remained 





SAMPLING METHODS IN YUGOSLAV CENSUS 735 


that could have been corrected and the specific properties of these 
errors that were responsible for their being not corrected. 

Perhaps the best solution of this problem would be the use of quality 
control methods as described in [5] in connection with the verification 
of punching. This system, however, could not be used in this census 
because of the fact that the necessary professional personnel were 
engaged in other parts of the program. What was possible, was a study 
of editing on the basis of the corrections appearing on the census ques- 
tionnaires after the editing was completed. 

The experiments carried out so far show that this survey should be 
taken on a relatively large number of enumeration districts as primary 
units and on a small number of households as secondary units within 
each selected enumeration district. Since all editing of questionnaires 
for a particular enumeration district was handled by one person, the 
quality of work within enumeration districts was homogeneous. 
Furthermore, the households within enumeration districts also tend to 
be homogeneous with respect to many characteristics. A satisfactory 
insight into the quality of editing, therefore, requires a large sample of 
primary units and a smali sample of households within each primary 
unit. This procedure is supported by the fact that in our conditions of 
storing census material the access to data on both enumeration dis- 
tricts and households is relatively easy, while this does not hold for 
data on individuals. 

This analysis of the quality of editing is considered very important, 
since our study referred to earlier has shown that the classical way of 
editing answers on all questions could well be replaced by some more 
economical procedure. 

At the moment of writing this work is not yet completed. 


CONCLUSION 


The most important fields for the application of sampling methods 
in connection with censuses are as follows: (i) pretesting, (ii) getting 
answers on certain additional questions, (iii) coverage check, (iv) evalu- 
ation of the volume of errors in answers, (v) quality control of the 
processing, (vi) advance estimates, (vii) tabulation, and (viii) replace- 
ment of the complete census by the sample census. 

All of these points were not included in the work reported here, since 
the full program would have required a much longer period of prepara- 
tion and experimentation than we had at our disposal. In our opinion, 
a systematic use of sampling methods in connection with a census of 





736 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


population should not be started on a broad front until a certain 
amount of experience from the same or very similar work had become 
available. Before we started we did not know the answers to many 
questions of fundamental importance for such surveys. Will it be 
possible to find adequate inspectors? What will be the quality of their 
field work? How are the people going to react to our demands? What 
ways and resources will prove best in looking for proofs of the accuracy 
of census results? What is the magnitude of the components of the 
variances in different surveys? and so forth. Without having such 
basic information it would be hazardous to start with a large program. 
Therefore, the main emphasis in this work was put not on obtaining 
results of interest to the “statistical public” at large, but oa securing 
information about how to proceed in the next census. In this connec- 
tion a number of experiments are still going on at the moment of 
writing and still others are planned. But the results already obtained 
have clarified many problems important for the application of sampling 
methods to the points discussed above, and have also thrown light on 
many other dark places in the last census for which more adequate 
solutions will, we trust, be found in the next census. 


BIBLIOGRAPHY 


{1] Balaban, V.; Ispitivanje pismenosti [An inquiry into the literacy data], 
Statistitka revija, IV, 1954. 

[2] Blejec, M.; Introductory remarks to “Vzorec prethodnih rezultatov popisa 
prebivalstva 81 marca 1953 v LR Slovenijt” [Sample estimates of the results 
of the census of population as of March 31, 1953], Statistical Office, Ljub- 
Ijana 1953. 

[3] Blejec, M.; Kontrola tatnosti zbranih podatkov popisa prebivalstva 31 
marca 1953 v LR Sloveniji [Checking accuracy of the results of the popula- 
tion census as of March 31, 1953], Statistical Office, Ljubljana 1953. 

[4] Blejec, M.; Uzorak prethodnih rezultata popisa stanovnistva u 1948 i 1953 
godini u NR Sloveniji [Sample results of the census of population as of 1948 
and 1953 in P.R. of Slovenia], Statistitka revija, III, 231-42. 

[5] Daly, F. J., and Guilford, L.; “Sample verification and quality control 
methods in the 1950 census,” in Hansen, M. H., Hurwitz, W. N., and Ma- 
dow, W. G.; Sample Survey Methods and Theory, New York 1953, Vol. I. 

[6] —:Ekonomska obelezja stanovnt&tva; rezultatt uzorka [Economic character- 
istics of the population: sample results], Statistical Bulletin No. 28, Federal 
Statistical Office, Beograd 1954. 

[7] Hansen, M. H., Hurwitz, W. N., and Pritzker, L.; “The accuracy of census 
results,” American Sociological Review, 18 (1953), 416-23. 

[8] :Vitalna, prosvetna i socijalna obelezja stanovnisiva; rezultati uzorka [Vital, 
educational, and social characteristics of the population: sample results], 
Statistical Bulletin No. 29, Federal Statistical Office, Beograd 1954. 





SAMPLING METHODS IN YUGOSLAV CENSUS 737 


[9] Zarkovié, S. S.; Greske popisa stanovnistva [Population census errors], Studije 

i analize No. 3, Federal Statistical Office, Beograd 1954. 

[10] Zarkovié, S. S.; Potpunost popisivanja [Completeness of enumeration], 
Studije t analize No. 4, Federal Statistical Office, Beograd 1954. 

[11] Zarkovié, S. S.; Rezultati uzorka [Estimating census figures], Studije i 
analize No. 1, Federal Statistical Office, Beograd 1953. 

[12] Zarkovié, S. S.; Sampling control of literacy data, Journal of the American 
Statistical Assoctation, 49 (1954), pp. 510-519. 

[13] Zarkovié, S. S., and Balaban, V.; Ispitivanje pismenosti [An inquiry into 
the literacy data], Studije i analize No. 7, Federal Statistical Office, Beograd 
1955. 





ON ADJUSTING SAMPLE TABULATIONS 
TO CENSUS COUNTS 


M. A. E1-Bapry 
Cairo University 
AND 


F. F. StrerHan 
Princeton University 


INTRODUCTION 


HERE are many occasions in which statisticians have to consider 

how they can make the best use of data that are obtained by sam- 
pling, particularly when they have available to them data from other 
sources that can be used to improve the estimates they wish to derive 
from the sample. Problems of this kind are likely to require the atten- 
tion of statisticians to an increasing extent as the flow of information 
from sample surveys and sample tabulations drawn from established 
record systems expands. There are many indications of a strong trend 
in this direction. It promises to provide more data, more quickly, and 
estimates of greater value than the older sources of a more cumbersome 
or more casual nature. At the same time its value will depend on the 
fullest use of the data that are provided by other sources for the im- 
provement of sample estimates and also for the supplementing of each 
by the other. 

An outstanding example of this problem is presented by the numerous 
tabulations of data from samples that are part of the 1950 Census of 
Population. In these the connection between the sample and the other 
source of data is very close since the sample is taken in connection 
with the regular enumeration or by a selection from the returns after 
they are completed. By these procedures the Bureau of the Census 
has greatly enriched the information it has been able to publish about 
the American population. The introduction of an element of sampling 
error into the results is much more than offset by the value of the addi- 
tional variables and cross tabulations that sampling made possible. 

While the effect of sampling error may be a very reasonable price 
to pay for the additional data that sampling provides, a statistician very 
naturally tries to reduce the cost to a minimum. Moreover, he often 
finds it awkward to have two figures for the same item of information, 
one from the regular census count and the other from the sample esti- 
mate. He finds the public rather impatient in insisting that such figures 
be reconciled and that only the “correct” figure should be published. 


738 





ADJUSTING SAMPLE TABULATIONS 739 


For his own purposes, however, he is likely to find that he is better off 
with the original figures for the sample as well as the full count, since 
he can then make the best use of both for the particular problem on 
which he is working. Just as he often finds after rounding a set of per- 
centages that he must adjust them further to meet the public expecta- 
tion that they total to exactly 100 per cent, so too, he often finds that 
he must make a somewhat artificial adjustment of sample estimates 
to make them add up to certain established totals. He very naturally 
wishes to hold to a minimum the cost of such arbitrary adjustments and 
at the same time to reduce the effect of sampling error. This leads to a 
search for economical and effective methods of adjusting estimates 
from samples to make them agree in certain ways with other data and 
to reduce as far as is practicable the elements of sampling error that re- 
main in the estimates after the adjustment. 

The search for good methods is not simple. The statistician recog- 
nizes that there are other sources of error than sampling in his data, 
that modern sampling is done quite often according to very complex 
designs, and that even with electronic equipment computing adjust- 
ments may be fairly expensive unless the procedures are simple. 
It may be worth while to sacrifice some part of the effectiveness of the 
adjustment in order to reduce the work to simple and economical meth- 
ods that will be widely understood, consistent from one set of data to 


another, and readily reduced to routine. At the same time it is not worth 
while to make needless sacrifices in the effectiveness of the methods 
and to this end it is possible to get some help from the application of 
statistical theory even when the full complexity of the problem is not 
adequately reflected in the theoretical models that are used. This paper 
is directed toward the development of the theory of some simple ad- 
justment procedures. 


AN EXAMPLE OF THE PROBLEM OF ADJUSTMENT 


Some of the questions that arise in the adjustment of sample esti- 
mates may be illustrated by the figures in Table I for the Standard 
Metropolitan Area of Austin, Texas, showing school enrolment by oge. 
For the population 5 and 6 years of age the sample agrees so closely 
with the census count that hardly any adjustment is needed to make 
them agree exactly. In contrast, the sample estimate of the population 
7 to 13 years old exceeds the count by 6 per cent and the sample esti- 
mate of the number of children enrolled in school is greater than the 
census count of all children of that age! This clearly shows the need for 
an adjustment of the estimates. It seems reasonable to make the ad- 





740 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


justment by simply multiplying the count of the population of this 
age by the proportion of the children in the sample in this age group 
who were enrolled in school. This would correct most of the sampling 
error in the estimate of school enrolment. Similar but more moderate 
adjustments are needed for the ages 14-15 and 16-17. They can also 
be made for the older age groups. In the case of the older ages, however, 
the statistician may have noticed that there is evidence of a persistent 
tendency among the sample estimates for various areas to underesti- 
mate the number of adult males. This is the result of causes that are 
not fully understood. Among the causes there appears to be a slight 
tendency among enumerators to avoid putting the head of a household 
on the sample lines predesignated on the census schedules. It may be 
associated with the fact that somewhat more information is required 
in the case of heads of households than in the case of other members of 
their families. However, the listing procedures do not offer the enumera- 
tor any choice in the order of listing the head of the family and the tend- 
ency is probably limited to a minority of the enumerators. 

If this tendency is the source of the underestimate of the population 
18 years old and older then the figures for school enrolment in these 


TABLE I* 


ADJUSTMENT OF ESTIMATES OF SCHOOL ENROLMENT, 
BY AGE, IN THE STANDARD METROPOLITAN 
AREA OF AUSTIN, TEXAS, IN 1950 








Total Population Enrolled in School 





Age 


Census 
count 


Sample 
estimate 


Sample 
estimate 


Per 
cent 


Adjusted 
estimate 





5- 6 

7-13 
14-15 
16-17 
18-19 
20-24 
25-29 





5,638 
15,298 
4,008 
4,154 
7,517 
19,641 
16 ,298 





5,635 
16,180 
3,900 
4,305 
7,155 
19,310 
15,715 





1,390 
15,350 
3,600 
3,050 
4,065 
7,490 
3,290 





24.7 
94.9 
92.3 
70.8 
56.8 
38.8 
20.9 





1,391 
14,513 
3,700 
2,943 
4,271 
7,618 
3,412 





* The source of all the figures except the last column is 1950 Census of Population, volume II, part 
43, Texas, 43-81 and 43-98. The sample was taken by desicnating every fifth person listed by each enu- 
merator to be included in the sample. The sample estimates were formed by multiplying the sample tabu- 
lations by 5. For additional details about the sampling see the introductory text of any of the parts of the 
above reference. The adjusted estimate is formed by multiplying the sample estimate of children enrolled 
in school by the census count for the same age group and then dividing the product by the sample esti- 
mate of the population of that age group. 








this 


oup 
ling 
rate 
also 
ver 
ent 
sti- 


ght 
old 


red 
3 of 
ra- 


1d- 


on 
Se 


=o t+ Tt @& °F 


ADJUSTING SAMPLE TABULATIONS 741 


: ages may be about right or even possibly overestimates. The effect of 


the adjustment in this case could be to increase the error of the esti- 
mates of enrolment. The problem is complicated by the high proportion 
of university students in this area and the likelihood that the effect of 
sampling was different to some degree for them than for school children 
almost all of whom were living at home. This example leaves un- 
answered, then, some of the questions that arise in the use of sample 
estimates in connection with other sources of data. 

The method of adjustment that was used in Table I is the method 
that is suggested in the Census reports under the title “ratio estimates.” 
It brings the estimates into agreement with one set of census counts 
but frequently there are several counts to which the sample estimates 
can be adjusted and the adjustment to one of them leaves the estimates 
out of agreement with the others. If it is important that the estimates 
be completely consistent with several sets of counts, the adjustment 
must be done simultaneously for all and the procedure becomes more 
complicated. In this case more help is needed from statistical theory to 
develop methods that tend to reduce the effects of sampling error at the 
same time that they succeed in adjusting the estimates so that certain 
totals of them agree with data from other sources. 

Before considering the application of theory, we may find it useful 
to examine the possibilities of extending the application of the ratio 
estimating procedure that was used in Table I to the adjustment of 
estimates to two sets of counts. 


AN EXAMPLE OF TWO-WAY ADJUSTMENT 


A good many of the Census sample tabulations consist of a cross 
classification of two variables from the sample for each of which the 
simple frequency distributions are available from the complete count. 
For example, the reports give for each state, standard metropolitan 
area, and city of 100,000 or more inhabitants a table derived from the 
20 per cent sample showing the age distribution of each of the following 
groups: 

(1) males in the labor force 

(2) males in the civilian labor force who are employed 

(3) males in the civilian labor force who are unemployed 

(4) males who are not in the labor force and the same groups of 
females. The reports also show the complete counts of males and fe- 
males in these groups and also the age distribution of all males and 
of all females. Children under 14 are excluded from all these tabula- 





742 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


TABLE II 


EMPLOYMENT STATUS ACCORDING TO AGE OF RURAL NON-FARM 

MALES IN A 20 PER CENT SAMPLE FOR NEW JERSEY AND THE 

CORRESPONDING CENSUS COUNTS BY AGE AND BY EMPLOY. 
MENT STATUS FROM THE 1950 CENSUS OF POPULATION 








Military Civilian Labor Force Not in the Total Total from 
Labor Labor from Census 
Force Employed |Unemployed Force Sample Count 

(1) (2) (3) (4) (5) (6) 





Age 





14-19 7,740 4,760 730 27 ,565 27 ,063 
20-24 6 ,660 11,705 990 - 22,575 22 , 822 
25-29 3,215 15,445 680 21,915 22,761 
30-34 2,170 505 21,605 22 ,555 
35-39 1,095 2. 485 ¢ 20,235 21,285 
40-44 385 460 18,180 19 ,388 
45-49 170 440 16 ,040 16 ,736 
50-54 95 570 15,135 15,761 
55-59 30 615 13 ,275 13 ,845 
60-64 10 485 10,655 11,621 
65-69 0 415 8,745 9 ,087 
70-74 10 145 6,550 6,700 
75+ 15 65 7,225 7 ,646 





Total from 
sample 21,595 131,055 6,585 209 ,700 





Total from cen- 
sus count 21,830 137 ,075 6,532 217 ,270 























tions. The nature of the adjustments that can be made in the sample 
tabulations from the census counts can be seen in the examples given 
in Tables II, III, and IV. 

In Table II the right hand column and bottem row show the counts 
to which the estimates are to be adjusted. It is evident, especially in 
the case of the estimates of the number of unemployed that adjustment 
by rows will give one quite different estimates than the adjustment 
by columns if the ratio method that was used in Table I is applied to 
this table. The figures for the number of unemployed would differ 
by about 33 per cent. The other estimates would be much closer to 
agreement. If the adjustment is made by rows and then another ad- 
justment by columns is applied to the resulting table of estimates, 
the new row totals will no longer agree with the corresponding census 
counts. From this it is clear that the ratio method cannot be used to 
bring all the totals into agreement with the census counts simultane- 
ously, except by some modification of the procedure. 

Table III shows the results of applying the ratio adjustment by rows. 
The column totals can now be brought into agreement with the cor- 





THE 
LOY. 
V 


-_—— 
— 


| from 


sus 
unt 


ADJUSTING SAMPLE TABULATIONS 


TABLE III 


RESULTS OF, MAKING RATIO ESTIMATE ADJUSTMENTS 
OF THE ROWS IN TABLE II 








Ratio for Estimates after Adjustment by Rows Adjusted | Proportion 
Age Adjustment ' Row tor Second 
of Row , (4) Totals | Adjustment 








14- -981788 
20- 1.010941 
25— 1.038604 
30—- 1.043971 
35 — 1.051890 
40-— 1.066447 
45— 1.043392 
50 — 1.041361 
55— 1.042938 
60 — 1.090662 
65— 1.039108 

70- 1.022901 | 
75+ 1.058270 


-_ 


GB Mm MH CO 69 2D mt te ee oe 09 CO 
R23 F 
_ oo 


27 ,063 - 124559 
22 ,822 - 105040 
22,761 - 104759 
22 ,555 - 103811 
21,285 -097966 
19,388 -089235 
16 ,736 -077029 
15,761 .072541 
13 ,845 -063723 
11,621 -053486 

9,087 .041824 

6,700 -030837 

7,646 -035191 


JERSE 
aon 


- - - © 


- 


oe 


CHOSTHWHORABDRHOO 
m GC OO 


CR WOMARADD ENO 


oown 





Total 21,843.4 | 136,766.0 51,837.3 | 217,270 





Count 21,830 137 ,075 51,833 217,270 























Difference 13.4 —309.0 . 4.3 








responding counts by modifying the adjustment in a manner that will 
maintain the adjustment by rows. This can be done by distributing 
the needed total adjustment in each column in proportion not to the 
cell frequencies but to the adjusted row totals. The results are shown 
in Table IV, as the upper of the two figures in each cell. 

If the adjustment had been made first by columns and then using 
the modification of the ratio method by rows, the results would be 
those shown by the lower of the two figures in each cell of Table IV. 
A comparison of the two estimates in each cell shows the degree to 
which the two ways of applying the adjustments differ. The average of 
the two estimates will tend to be better than either of them alone. 

The modification of the ratio method that was used in this example 
is a particular case of a device given in [5, p. 169, step (11) | and applied 
to the first adjustment as well as the second in [4, p. 249]. The effective- 
ness of this method of adjustment will be examined in a later section 
of this article. 

The method of making the adjustment may be expressed in general 
terms as follows: 

(1) Let the sample frequency in the ith row and jth column of the 
table be n;; and the totals of the 7th row and jth column be n,, and n.; 





744 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


respectively. Let N;, and N.; be the row and column totals from the 
census count. Finally let n and N be the grand totals of the sample and 
the census count respectively. Then the figures in Table II are formed 
by multiplying each n;; by N/n. In Table III the ratios for the adjust. 
ment of rows are computed by taking N;, and dividing it by the ith 
row total which is equal to n;.(N/n). 

(2) The cell frequencies of the adjusted table such as those shown 
in columns (1) to (4) of Table III are then computed by multiplying 
the estimates in Table II by the ratios for their respective rows. (These 
results can be obtained directly by multiplying the original sample 
frequencies n;; by N;./n;..) The right hand column of Table III can be 
computed from the census counts or from the row totals since they now 
agree. The column totals are obtained and the differences found to 
complete Table ITI. 

(3) For each column the difference between its total and the cor- 
responding census count is multiplied by each of the proportions for 
the second adjustment in turn and the products are then subtracted 
from the numbers on the correzponding row in the same column. This 
yields the final estimates whica ave equal to 


nii(Ni./ni.) — (Ni./N) [> (majNr./nrr.) — N 4] (1) 


the summation being over h=1, 2, - - - , 7 for a table of r rows. A simi- 
lar method is used with rows and columns interchanged when the first 
adjustment is made by columns and the second by rows. 


APPLICATION OF STATISTICAL THEORY TO THE ADJUSTMENT OF 
SAMPLE ESTIMATES 


Statistical theory offers several approaches to the problem of finding 
suitable methods of adjusting sample estimates. Among the principal 
approaches are those of the classical least squares theory as developed 
by Markoff, the principle of maximum likelihood developed by Fisher, 
and the approach of decision theory developed by Wald. The applica- 
tion of any of these approaches is complicated by the fact that the 
sampling is usually done with a considerable number of departures 
from simple random sampling, such as stratification, systematic selec- 
tion, the use of clusters as units of sampling, subsampling, variable 
probabilities of selection, weighting, etc. Consequently a model that 
faithfully represents the sampling process may be too complex to be 
utilized in any of these approaches. Moreover, the actual sampling 
operations may be affected by biases and eccentricities other than those 
that are inherent in the model. For these reasons a moderately simple 





TABLE IV 


FINAL ADJUSTMENT OF THE ESTIMATES IN TABLE II DIS- 
TRIBUTING COLUMN DIFFERENCES IN PROPORTION TO ROW 
TOTALS (UPPER FIGURES) AND CORRESPONDING ADJUSTMENT 
MADE INTERCHANGING COLUMNS AND ROWS (LOWER FIGURES) 








Military Civilian Labor Force Not in the 
Labor Labor 
Force Employed |Unemployed Force 

(1) (2) (3) (4) 


7,598 4,712 680 14,073 27 ,063 
7,705 4,229 688 14,441 








6,732 11,865 3,255 22 ,822 
6 ,688 11,964 3 , 202 


3,338 16 ,073 2,674 22,761 
3,254 16,178 2,654 


2,264 17,947 1,847 22,555 
2,203 18 ,008 1,840 


1,150 17,834 1,819 21,285 
1,129 17,840 1,828 


409 16,739 1,775 19,388 
434 16 ,669 1,815 


176 14,214 1,909 16,736 
174 14,239 1,886 


98 12,456 2,635 15,761 
97 12,496 2,602 


30 10,105 3 ,087 13 ,845 
35 10,146 3,052 


10 7,635 3,463 11,621 
67 7,661 3,395 


4,330 4,338 9 ,087 
4,372 4,298 


2,112 4,439 6,700 
2,111 4,443 


1,053 6,519 7,646 
1,162 6,377 




















137 ,075 51 ,833 217 ,270 








746 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


and practical method of adjustment is attainable only after som 
considerable compromise with the known features of the sampling 
process that produced the data and also those of the methods of est. 
mation which may have been employed. One is forced to approximate 
the actual process by a greatly simplified model in order to find a feas- 
ble basis for applying statistical theory to the choice of a method of 
adjustment. 

The application of statistical theory to the adjustment of sample 
estimates is further complicated by the fact that even when the error 
of the sample data in their original form are independent, once the esti- 
mates are subjected to conditions that equate certain sums of estimates 
to given constants, the errors of the estimates are correlated. More 
than this, they form a linearly dependent set and it is necessary to 
either introduce additional variables in the form of Lagrange multipliers 
or reduce the set of estimation errors to a subset that is linearly inde- 
pendent before the application of the model proceeds. 

It is clear that the degree of success that has been attained by 
previous efforts such as those of Deming and Stephan [1] in utilizing 
least squares and by Smith [4] in applying maximum likelihood de- 
pended on the use of a relatively simple model to represent the actual 
sampling process, namely the multinomial model. This model assumes | 
sampling with replacement (or from an infinite universe) and is char- 
acterized by the parameters p;; which are the probabilities that an ele- 
ment (person) drawn for the sample will be of the kind that is classified 
on the 7th row and in jth column. The magnitudes of the parameters 
are not known but are to be estimated from the data they produce. 
Stephan’s iterative procedure [5] which neglected the correlations 
among the errors can be used to apply the approach of simple decision 
theory when the loss function for the adjustment can be approximated 
well enough by a linear function of the squares of the amounts by 
which the first estimates of population cell frequencies, (N/n)n;;, are 
adjusted. This may be a substantial compromise since it attains a 
practical method at the cost of possibly inaccurate formulation of the 
loss function. Further inquiries are needed to develop simple methods 
with a better theoretical foundation that take account of the relevant 
properties of the actual sampling process. 

Even when the problem is simplified by the assumption that the 
sampling conforms to the multinomial model, the actual computing of 
the adjustments may be laborious and other difficulties may be en- 
countered. It may be instructive to examine the successive steps in 





ADJUSTING SAMPLE TABULATIONS 747 


the application of the least squares approach, starting with Markoff’s 
theorem. 


MARKOFF ESTIMATES 


Markoff’s theorem on estimation deals with problems of point esti- 
mation in which a number of observations, here represented by 
||X||, ie. the vector [X1, Xo, ---, Xn], are known to have expected 
values that are linear functions of the parameters to be estimated, 
|||, and it is desired to obtain estimates which are linear functions of 
the observations, || X]. 

The theorem states that if the n readings ||X|] are uncorrelated and 
have the same variance and if the relations between the expected 
values of the X’s and the m unknown parameters ||é|| (mn) can be 
expressed in the form of a set of n linear equations of design 


E\|X|| = [|All lal (2) 


where || A|| is an n by m matrix of known elements, then the estimates 
||7|| of ||d| which are 1) linear in the observations, 2) unbiased, and 
3) of minimum variance subject to 1 and 2 are given by 


7] = | all all ald (3) 


providing || A||’ || A|] is nonsingular. 
In the case in which the readings are correlated the theorem takes 
the form [6] 


7H] = Ut All lool All All eel [lt (4) 


where ||0*"|| is the inverse of the matrix of covariances llo,a||, provided 
the covariance matrix is nonsingular. 

It may happen that the variances and covariances o,, are functions 
of the unknown parameters ||¢|| and that consequently the equations of 
estimation (4) do not give explicit formulas for the estimates ||7'. 
In such cases it is sometimes possible to obtain practical approximations 
to the Markoff estimates [6] by substituting for the needed numerical 
values of ||¢|| in the functions o% the symbols for the estimates of these 
values ||7|| and solving the resulting equations for ||7'||. The equations 
to be solved will usually be quite difficult and some approximate 
method for their solution, commonly iterated to improve the approxi- 
mation, will be used. These estimates give the values of the parameters 
for which, given the sample, a certain function of the parameters is min- 
imized. In this it resembles the principle of maximum likelihood. 











748 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 fm ADJ! 





Indeed, for those probability models in which minimizing this function 
also maximizes the likelihood, the estimates are maximum likelihood 
estimates. These practical approximations are, then, 


Til = 1 alltel) ll ll el |x (5) 


where 6** has the same algebraic form as o® with ||é|| replaced by ||, 

Let us consider now in general terms the application of Markoff’s 
theorem to the case of contingency table of r rows and s columns, 
Suppose that the table is the result of taking a random sample of n 
readings in conformity with a multinomial model in which the prob- 
ability corresponding to the cell of the table on the ith row and in the 
jth column is p;;. The number of readings n,; that are observed to fall 
into the zjth cell is multiplied by N/n in order to obtain estimates 
X;; of the population frequencies in a population of size N. We will dis- 
cuss first the case in which none of the marginal totals of the popula- 
tion are known, i.e., only N is known. Now if we express the expected 
values N;; of the X;;as linear functions of rs— 1 unknown parameters {; 
in the form 


and 





















































Nij = ijt: + dite + Cots +--+ > + diz (6) 


where 7;;, b;;, - - - , are known numbers given by the conditions of the 
problem and set forth to estimate these parameters we will notice 
that, before applying Markoff’s theorem, the last equation giving the 
expected value of X,, in terms of the ?’s must be discarded because it is 
merely a linear combination of the preceding rs—1 equations. Also, 
since in the case of a multinomial distribution! var X;; « p;;(1—pi;) 
(where p;;=N;;/N) and cov (X,;, Xxx) < — pi sper, it can easily be shown 
that the elements of ||o*"|| are: 
1 1 
of x ——-+ —-, at «x 
Nr | Nv 
Pp, q=1, 2,---, rs—l. 
Now since 



































(p ¥ q), 






Oy a i, 


i=] j=l 





whatever values the / may take, we must have 


DL das= LD diby=---=0 








1 It is assumed here that if the population is finite, its size N is large enough to permit the use of the 
multinomial model. 





| ADJUSTING SAMPLE TABULATIONS 


and 


dds =N 


This means on in the first rs—1 equations of design Yo aij= — Ars, 
Sbis= —Yn, +++, and Didij=N—d,,. We also have DXi=N—Xv. 
Using these ahathins we see that mit || #4] || Al] and |] A]|’ ||o#l| xi 
finally take the forms 

a jbiy ;;Cij 


-2 
ai; 
——— ; ——» > ——yeee 


Ni; Nis Ni; 
bijCis 


en 1 


|All lon [All « Nis Nii Nis 


C4,0i; C4,0;; c;;? 
) » »y— 2b y+ 


Ni; Ni Ni 














aii(Xiz — dis) 
Ni 

bi(Xis — dis) 
Ny 

ci(Xiz — dis) 
Ni 





> 


> 
> 





[Allon [XII 

















the summations being over i=1, 2,---,7;7=1,2,°°:,8 
The equations (4) can then be written in the form: 
a;3bi; a5;(Xej 7 dis) 


ai;? 
T, eco =e 5] ¥ 
ua t T: Na * he (7) 








‘ > Sate bi;? bi( Xai ae di;) 
T,. T ooo 
kk me x Nu 


(rs — 1 equations) 





750 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 
or 


>» ~ (X.; — N;;') = 0, where 


+] 


Niji) = aT, + 05,72 + CTs +--+ + diy, 


bs; 
> — (Xu — Ni/) =0 
Ni B] 3 


(rs — 1 equations). 


The practical approximations referred to in equation (5) will be 
given by the set of rs—1 simultaneous equations: 
aijXij = _ —_ = : 
p — (= 0, where Ni = ai;T; ot. bi;T'2 a C:;7's + os 6 + di; 
ij 
bss X i; 
Ny 


> 


=0 


which are evidently equivalent to the maximum likelihood equations 
of estimation. The simultaneous solution of equations (8) will give the 
approximate Markoff estimates ||7/||. Up to this point we have been 
considering the case in which only N is known. When the column totals 
of the expected values are known, the above setup remains unchanged 
except that we must have >a; > bi;= --s =0, and di5= N j. 
Also the number of parameters under estimation is reduced to (r—1)s. 
When the row totals are also known we must have in addition to the 
above constraints )>ja;;= >\jb;;= --- =Oand >d;;=N;. and the 
number of parameters is reduced to (r—1)(s—1). 


ESTIMATION OF POPULATION CELL FREQUENCIES 


a) Two-way table with one set of population marginal totals known. 
If the column totals are known then we take the unknown cell fre- 
quencies N,; in the upper r—1 rows of the table as our (r—1) Xs un- 
known parameters. The observations are X,;=(N/n)n,; of which the 
expected values are N;;. Equations (8) become 


> 


aiMij bimis ‘ 
—~———— = @, =— =0,--- (r—1) X 8 equations 9) 
Ne he Ne (r — 1) X seq ( 





ADJUSTING SAMPLE TABULATIONS 


where 
= 1, a, = —1, an = 0 (@@ ¥lorr), 


=-1, a2=0 (i ¥2orn), 


This shows that the estimates of the population cell frequencies in 
any column are proportional to the observations in the same column. 

b) Two-way table with the two sets of population marginal totals known. 
In this case we take as our unknown parameters the population fre- 
quencies in the first r—1 rows and s—1 columns. Equations (8) then 
become 


> 


where 


bins; 


aiNi; 
—=—=0, > —"* 0,--+ ((r — 1)(s — 1) equations) 


’ 
ij aj 


au=1, 44,.=—1, an=—1, a,.=1, and all the remaining a;;’s vanish, 


be=1, b= —1, b2=—1, b,.=1, and all the remaining b;,’s vanish, 


t3=1, C4=—1, ¢3=—1, Cr =1, and all the remaining c;,;’s vanish. 


The approximate Markoff estimates are therefore given by the (r—1) 
‘(s—1) equations 





752 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


THE METHOD OF PROPORTIONAL DISTRIBUTION OF 
MARGINAL ADJUSTMENTS 


In the foregoing applications of least squares or maximum likelihood 
the statistician is frequently presented with a considerable number of 
equations to be solved simultaneously. It is natural under these condi- 
tions to look for short cuts and approximate methods. One such that 
has been suggested for use when the cell frequencies are roughly pro- 
portional is the method of proportional distribution of marginal ad- 
justments [4]. It proceeds to compute the adjusted estimates M,, 
of the population frequency in the zjth cell, 


u -={ +(EnN +(=N \ 7 
a Nij www n:.) NV Jj n.) - (il) 


This estimate is unbiased and adds up to the known marginal totals 
of the population. M,;has also the advantage of simplicity of processing. 
Yet, in the case of small cell probabilities, the variance of M;; may be 
larger than that of the original inflated estimate J;;=(N/n)n;;. To 
illustrate this result consider the variance of M,;. Since cov (nj; 
Nj.) =NPiGi., COV (ij, 0.3) =Npi;g.3 aNd COV (N;., N.;) = —NPi.p.js+Npiy 
the variance of M;; is 

N? N? 
var Mj; = Pais + = {2pis(3p;. p.4 — Di. — Ds) 


+ pip.s(pi. + pi — 4pip.)}. 


The first term on the right hand side of this equation is the variance 
of I;;. Therefore the variance of M;,; will be greater than that of [;; 
when 


{4p..p.; — (pi. + 7.5) } (2pi3 — pi.p.j) > 2pispi.p.s 
{4 — wis} (1— 4pi.p.5/pis) > 1 


i.e., (12) 


Now u,; can take any value between 2 and infinity but the above 
inequality (12) is not satisfied by the values of u,; ranging between 3 
and 4 because then [1—(p;.p.;/2p;;) | would have to be greater than a 
quantity which is itself greater than or equal to one. This is naturally 
impossible because p;.p.;/2p;; is always positive. The inequality is not 





ADJUSTING SAMPLE TABULATIONS 753 


satisfied for values of u,; ranging between 2 and 3 either because in that 
case it would be satisfied only if 


2p: 1 
 . ; (13) 
Dips 3 Ui 
Assuming that p.; is the larger of p;, and p.; and since p,;;S p;. we must 
have 
2 1 
—-1> 
Dj 3 — Uij 


But evidently 





and consequently (13) cannot be satisfied. 
Thus the variance of M;;is larger than that of Z;; when 


i.D.j 1 
4<uj<o and py < PPA (1 - ). (14) 
2 Ui —3 


Equation (14) is likely to take place in tables in which some of the 
marginal totals are small. 

For instance in a population of 100,000 individuals if the row and 
column totals corresponding to a certain cell are 40,000 each then M;; 
would have a larger variance than J;; if the population frequency in the 
cell is less than 4,000. If the marginal frequencies are 10,000 each then 
var M;;>var I;; when the population frequency in the cell is 470 or 
less. It should also be mentioned that this drawback is independent of 
the sample size and hence would not be cured by increasing n. 


THE METHOD OF ARRAY PROPORTIONS 


In this section estimates based on n,;/n;, and n;,;/n.;, the row and 
column proportions, will be considered. Before discussing the estimates 
we have to study certain properties of these proportions. We are going 
to discard the case where the array total in the sample is zero. In prac- 
tice, that would mean that if the sample table has a certain array total 
of zero, we would not proceed to estimate the population cell frequen- 
cies in the corresponding population array. 

The expected value of an array proportion. 





754 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 


B(™ (n;. ¥ 0)) = > | Pr{n. (n;. ¥ 0)} B(7* ni(ni. # »)| 


ng.=l N;. 


~ » Pr{ni(ns. ¥ 0)} ne — 4i.")7 (15) 


Ngeml 


dij Pi 
= —(1— gi)" — a.) = — 


Row (or column) proportions not belonging to the same row (or column) 
are uncorrelated. 
To prove this result it will be shown that 
Niz Nk Nij Nk 


E— — = E—E—») 
N;. Nk. Nj. Nk. 


where ki and n;., m.+0 


n(2t8) = prin mte(2e2 ime n)] 


Ni. Mk Ng. NE-—l 


on both sides of which it is understood that n;., m.0. 
Hence 


B(~ =) > Pr{n;, nz.}(1 — Pr{ng, ne. ¥ 0})-? — Pix Pus 


Ni, Nk. Ng. Mk. Di. Pr. 


(1 — Pr{nj., ne. ~ 0})- wed PY — Pr{ ng nx0}) 
Pi. Dk. 


t. 


— Pi Pe Ew. Poa 
Di. Dk. 4. Nk. 


Evidently this relation can be extended to any number of proportions. 
We thus see that 


cov (ni;/N;., Nex/Ne.) = 0. (16) 


Correlation of array proportions belonging to the same row (or column). 
Adopting the multinomial model, i.e., assuming that the population 
is large, we have 


Pr{njj, Nit, n;.} = Pr{nj.} Pr{ nis, nat| ni.} 





ADJUSTING SAMPLE TABULATIONS 755 


cns)1( 24) ™ ()" 
Es n'(p;.)"*-(L — ps.)"-™- Di. Di. 


(n;.)!(n sad n;.)! (nj;) (nia) (ng. — Nig —Ni2) ! 


(1 Dii 7) m5 Mig — Mt 
Di. 





nmi. — 1 ni(p;.)?*-(1 — pi.) 
(n;.)\(n — n;.)! 


(=) nil—1 
Pi. 
ngi n=l (nj —_ Ll)! (nu bal 1) !(n,. — Niji ni)! 


(1 _ Bis + e) i 
Pi. 








Pig Pit 
= — —(1 — 4)" {a = oF 
Pi. Pi. 


n n!(p;.)"*-(1 pon 2 





Ni ot ni.(n;.) l(n — n;.)! 


_ Pi Pas _-£ —) 
Pi. Pi. N;. 


and consequently 
Nj nN; ij i 1 
cov (74, *H) - - PH Phy , 
ni = Ni. Di. Pi. Ni. 


Variance of the array proportion n;;/n;. 
Since 


n'p;.r-(1 _ pi.)"~"*- 
(n;.)'(n — n;.)! 


x (n;.)! (ZY: , say 
(nis) (ns. — nig)! \ pi. Di. 





Pr { nis, n;.} = 








756 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


mis\? Dis au i nip;.*¢-(1 — p;.)*-"*. 
B(™) r Ds. itd Xz mi. = (n;.) "(n — 1;,)! 


~~ p(y 
x 2 nis Di. (1 


nyj=l (nj — 1) !(nx. a ni;)! 
The second summation 


oe % MHS! 
,. m. — oi(24) : 
at% > Di. (1 - Di 


ni.—nij 
njju2 (ni; — 2) (ni. — ni)! og 


Dij nij—2 
‘ (ni. — 21 (24) 
Dis une Pi. Pii 
=1+—(n,;.-1 ( _ 
Di. ( 2» (ni; = 2) (nx. = nj)! Di. 
Loe, - Ho oe, 241-8, 


Pi. Di. Di. 














and thus 


E (~*) -Pa- qi") {Fe (1 — qi.") 
N;. Di Di 


+ (1 a #) (1 — q:.) s—} 
= 


Di. 


Nis Dig ( Pi 
var —— = — (1 —- — 
nj. Di. Pi. 
Estimates based on array proportions. 
Consider the estimates 


Ni; 


N. 
Li = “4 yy 4+—2(™ ss pm 


N,5 j 





? In this case we could take as our estimate either Ly; or 
nu . Ni. Nj 
— Ni. +—(Ny - 2 “= 9). 
ni. N 6 7%. 


L;;' = 


The average of L;j and L4;' will have the advantage that its standard deviation is smaller than the aver- 
age of those of Ly; and Z4;". 





ADJUSTING SAMPLE TABULATIONS 
The expected value of 


ij Nj; t 
by = Py +2 (WN, > Zw.) 
Pi N i. OD, 
N; 
= Nu + >> (Ns. — Ns.) =N 


L,; is therefore unbiased. It also adds up to the column and row 
totals since 


Lj=LLi=N tN j—-N G=N 5 


and 
Nij 


‘ Li.=2obis “LW. it+Ni, “EW =N,.. 


The sampling variance of L,; is 


- Nj? ; 
var L,; = (N.;)? var bon + (=) var ( > om N. : 
n3 , 


N nj 
ox 2 a a (~, ban. v4) 
N Ni 5 Ny 
but 
Nj Ny Nim 
var (> —N Wa) = UN * ver —* +I NuNn cov (~* _ =) 
j J nN.; lem n iy ee 
Nj 


= > N.; var — 
nj 


1 
= N? Y piilp.s — pis) E-— by (16) and (18), 
j 


J 


and 


nN; Ni; Nj N? 
cov (74, > —N. ;) = N; svar — = —— pii(p.j — i) E 
nj i ny N.; Nj n.j 


by (16) and (18). 
Consequently, 


n 
var L;; = -_ {Pap — pij)(1 — 2p.;) — 


J 


n 
+i? D) pips — pis) B \ 
I 


nN.3 


33 #8" [ae 


ze 





758 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


Consider first the case where the column totals only are known. The 
above estimate then becomes 


Lig = (nij/ny)N 5 


which is obviously unbiased and adds up to the column totals N ,. 
To compare between the variance of L;; and that of the inflated esti- 
mate I;; we are going to use an upper bound for E(n/n.,). 
Stephan [6], gives several formulas for upper bounds of E(1/n,,). 
We are going to use here the upper bound 


U=Utwetwtuwt:::+(t+ lu 
where 
1—k (r — 1)upa — k/ 
& @ eee, | @ 
(n + 1)p (n + 1r)p 


The following table gives the upper bounds for the expected values 
of E(1/n.;) for n=1,000, 10,000. 

Thus, according to the above upper limit u, the variance of Lj; 
will be less than that of J;; as long as p,;>(np.ju—1)/(nu—1), which 


r 
(r > 1), k = npgr(1 — g*)-1. 





TABLE V 


UPPER BOUNDS FOR THE EXPECTED VALUES OF E(1/n.;) 
WHEN n=1,000, 10,000. 








n= 1,000 n= 10,000 





Upper bound 7 Upper bound 





.010091761 
.005020182 
-003341156 
.002503765 
.002002005 


-001667780 
-001429185 
-001250313 
-001111235 
. 113007 


.020395762 





.001000902 
-000500200 
.000333411 
.000250038 
.000200020 


.000166678 
.000142863 
.000125003 
.000111112 
-010101031 


.002003815 
. 112090 
.020414916 








ADJUSTING SAMPLE TABULATIONS 759 


is practically always true. When n= 1000, var L,; would be larger only 
if the expected frequency in the cell under consideration is less than 
one individual for any value of p.; as low as .05. When p.;=.01 the 
corresponding expected frequency becomes 1.2 individuals. In the case 
n=10,000 the variance of L,; is less than that of J;; as long as the ex- 
pected cell frequency is not less than one individual, even if p.; is as 
low as .001. Similarly in the case n= 100,000 the above inequality is 
true as long as the expected cell frequency is not less than one indi- 
vidual, even if p.;is as low as .0001. In the asymptotic case where 
1/n,;=1/np.; the variance of L;; will always be less than that of I;;, 
for any 3;. 

The fact that E(1/n.;) is asymptotically equal 1/np.; can be seen 
from statistical considerations: Since L;;= (n;;/n.;)N.;is identical with 
the maximum likelihood estimate in this case, the variance of L,; 
for large n will be identical with that of the maximum likelihood esti- 
mate. 

Now, the probability element in this case is 


1 
f= = { NukuNata - ~~ (Na— Nu — Na: + -)%AN 2%: - - 


where X;;=0 or 1 and ),X;;=1. 
Ni eae 


EX;; = gS 
oo int h--- 20% 


Since 
dlogf Xa Xe; logf Xi; X+j @logf Xz; 
ONw Na Ny ONG? Na? | Ne?  a%aa%y  N” 
0? log f 

ONiigne, : 








’ 


it follows that 


0? 1 1 1 1 3? 

po es a (-- ), pe les 

ON«;? = NN? \ pis Drs 
67] 

E og f = 
QNtignel 
The covariance matrix of the maximum likelihood estimates of 

Nu, Na,-:: is 


ONtignai N?2 


0: 





760 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


1 1 1 3? a 
aa nil os o. —§ oe cll cw E —1 
~ lle] = — leu = —| - 2%: 


where the ?’s denote the parameters Nj. 
Substituting in 


2 
| _E 0? logf 

0*;0' m 
and calculating its inverse we can easily see that the large sample van- 
ance of the maximum likelihood estimate of N;; is(N?/n)p;;(p.;—p,,) 
/p.; and since this is equal to N*p,;(p.;—p;;)E(i/n.;), it follows that, 
for large n, 7 


Let us consider now the case where both sets of marginal totals are 
known. In this case we use the original estimate (19) of which the vari- 
ance is given by (20). We are going to show that, for large n, the vari- 
ance of L,; is smaller than that of M,;. 

Assuming that n is large enough to enable us to put E(n/n.;) =1/p,; 
we get the following formula for var L;; 





var L;; = — 
n 


(1 — 2p.) + p.spis(D.5 — Dis) 


N? rth — pis) 


P.j 





+p3 > Dik(Die — pik (21) 


kj D.k 


In order to clear this formula from the p,’s which do not appear in 
the variance of M;; we are going to maximize (21) with respect to them. 
In other words we maximize 


> Dik(D.k — Dix) 


k D.k 





subject to bs Dik = Di. — Dij- (22) 
k 


Differentiating with respect to pu we get px=Apa2=(ps.—pis/(1 
—p.j))p.. (from 22). 

Hence max )epa(ps— pu)/pa=(pi.— pis) (1+ Dis — Di. — Ds) 
/(1—p.,) and consequently the maximum value of var L,; with respect 
to Dik is 








ADJUSTING SAMPLE TABULATIONS 
N? 1 
= foul. — Di) (p. +-—- 2) 

n P.i 

Di” 
+ (pi.— pis) (14+ Dig Di. — D5) * (23) 
1— pj 

Equation (23) would be less than or equal to var M;; when 


1 Pp. 
pis(D.5 — Das) p+—-2) a 
Pj 1—-p,; 


S pisQis+2pis(3pi.p.5—Di.—D.5) + i.P.5(Pi. HD.5—4Di.D.5) 


(p:.—pij)(1+Dij— Di. — D5) 


.2 2 
Pt) + apap. (1 — 3p5+— ) 
l—pDy 1— pj 


DP. 
~ petp(1 —4p3+ + ) <0 
1l—?p,; 


2p isPi. Di.*D.5 
PY _ (ap y2— PPE (ap, + PEP (1-2p,)220 
p.s(1—p.s) * (l=p.) “ " 1l=-ps : 


ie., 


(1 — 2p.;)? 
p.s(1—p.;) 


(pis — Di.p.s)? 2 0 


which is always true. 

Equation (23) and var M,; will be equal only when a) p;;= p;.p.5 OF 
b) p.;=}. Thus, for large n, var L;; is always less than that of M;;. The 
variances will be equal only in the following two cases: 

a) All the cell probabilities in the row containing the cell under con- 

sideration are proportional to their column totals, i.e., 


Pig = Di.D 3 for all 7 
b) pie * Dr for all k #7 and p.; = 1/2. 


L,; is thus more efficient than M;; and the computational labor is 
almost the same for the two estimates. The domain in which var L,; 
is larger than var J,;; and which can be defined by the two inequalities 


Di. tpi <1 


or 


and 





762 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


Pot geltaes ([p.s(1—p.s)(1—ps.) {9.51 —p.s) +0:.(1-3p.5+p.;?) } J" 
—p.s1—P.3—Di.p,;) 


is naturally smaller than the corresponding domain in the case of M,,. 
For instance, when N=100,000 and N;.=N.;=40,000, var L,;>var 
I;; when tbe expected cell frequency is 3,592 (4,000 in the case of M;;). 
If N;.=N.;=10,000 the corresponding figure will be 390 (470 in the 
case of M;;). We should not forget, however, that the formula (23) 
which we have been using for the variance of L,; is actually a maximum 
and consequently the above figures which correspond to L,; will always 
be reduced when the true variance (20) is used. 


REFERENCES 


{1] Deming, W. D., and Stephan, F. F., “On a least squares adjustment of a sam- 
pled frequency table when the expected marginal totals are known.” Annals 
of Mathematical Statistics, 11, No. 4 (1940), 427-44. 

[2] El-Badry, M. A., Extension of Markoff’s Theorem on Estimation, Ph.D. 
Thesis, London, 1948. 

[3] Fisher, R. A., Statistical Methods for Research Workers. London: Oliver and 
Boyd. 1925. 

[4] Smith, J. H., “Estimation of linear functions of cell properties,” Annals of 
Mathematical Statistics, 18, No. 2 (1947), 231-55. 

[5] Stephan, F. F., “An iterative method of adjusting sample frequency tables 
when expected marginal totals are known,” Annals of Mathematical Statistics, 
13, No. 2 (1942), 166-78. 

[6] Stephan, F. F., “The expected value and variance of the reciprocal and other 
negative powers of a positive Bernoullian variate,” Annals of Mathematical 
Statistics, 16, No. 1 (1945), 50-61. 





THE APPLICATION OF SAMPLING PROCEDURES TO 
BUSINESS OPERATIONS* 


Howarp L. Jongs 
Illinois Bell Telephone Company 


The purpose of this short article is to give business execu- 
tives an insight into sampling theory and procedures without 
confusing them with mathematical symbols or unexplained 
technical terms. Sampling is defined, and the principal appli- 
cations in the telephone business are briefly described. The 
procedures used are grouped into three broad categories— 
judgment sampling, systematic sampling, and random sam- 
pling—the relative advantages of which are pointed out. Vari- 
ous ways of minimizing the cost of random sampling are dis- 
cussed. A word of caution is added regarding the dangers of 
improperly selected samples. 


AMPLING procedures, as a business technique, have now been de- 
S veloped to the point where desired information about a large 
population can usually be obtained by examining a relatively small 
sample. Furthermore, the reliability or precision of the sample, as it is 
called, can be computed from the sample itself. 

In this article, the nature and purpose of sampling are described, 
the different techniques available are discussed, and the advantages of 
scientific, or random, sampling are emphasized. A section is devoted to 
methods of minimizing the cost of a sample of this type. The material 
is based on telephone company operations; but similar applications oc- 
cur in many types of business. 


WHAT SAMPLING IS 


Sampling means selecting and studying a relatively small number of 
individuals in order to find out something about the population from 
which they are selected. These individuals may be persons, or they 
may be animate or inanimate things. For convenience, they may be 
referred to as items. The number of items in the population may be 
small, or large, or may even be infinite. 

Sampling procedures are useful in situations where examining all 
the items in a population and recording and summarizing the results of 
such examination would be expensive, or time-consuming, or both, and 
where some degree of approximation in the final results is permissible. 
In many such situations, a sample of a few hundred items, properly 





* Revision of paper presented at a luncheon meeting of the Chicago Chapter, American Statistical 
Association, October 16, 1952. 


763 



























































































~— = 






































selected, provides information of sufficient accuracy with respect to 4 
population of several million items. The economies obtainable by em- 
ploying small samples are frequently of considerable magnitude. More- 


over, examinations of large numbers of items can result in less accuracy 
than is produced by a statistical sample properly drawn and observed 
by small numbers of well-trained and closely supervised personnel. 
Repetitive operations beyond a certain point produce inspection fatigue 
which results in errors and thereby adversely affects the results. 


PURPOSES OF SAMPLING 


When we employ sampling procedures in the telephone business we 
usually have one of three purposes in mind: 
A. To get information that is available but not ordinarily sum- 
marized. 
B. To supply a framework for making some kind of survey. 
C. To provide a basis for verifying the accuracy of the information 
shown on our records. 


An illustration of sampling with the first purpose—to obtain available 
information not ordinarily summarized—is the sampling of our custom- 
ers’ accounts to determine their local telephone usage. For most of our 
customers in Chicago, there is a tabulating card that shows the monthly 
usage of local service, and the charge for such service. We need to know 
this average monthly usage for customers in each rate classification, and 
the distribution of that usage; that is, the proportion of our customers 
that use 0-10 messages per month, the proportion using 11-20 messages 
and so on. It would be very expensive to summarize this information 
100 per cent; and so for about 25 years, we have been relying on sam- 
pling procedures. 

Samples with the second purpose mentioned—to supply a framework 
for some kind of survey—are taken periodically in connection with our 
attempts to measure the attitude of our customers toward the tele- 
phone company, particularly toward the service we furnish and the 
charges we bill for it. Accounting or plant records, and sometimes 
telephone directories, are used to select the sample customers. These 
customers are then interviewed, by telephone or otherwise, and asked 
to give answers to a set of questions. The analysis and summarization 
of these answers yields the information we want. Similar surveys have 
been taken in the last few years to measure the attitudes of supervisory 
employees toward the company and higher management. 

Another kind of sampling we do occasionally is the sampling of plant 
and engineering records to provide a framework for appraising the per 


764 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 








APPLICATION OF SAMPLING PROCEDURES 765 


cent condition of our telephone property. This condition is one of the 
factors we ask the regulatory commissions to take into account in fixing 
the rates we charge for telephone service. 

The third purpose of sampling—io verify the accuracy of the informa- 
tion shown on our records—is essentially the same as when sampling 
inspection is employed in a factory to control the quality of the product. 
This quality control function has two broad aspects: 


1. Acceptance sampling. 
2. Process control. 


Acceptance sampling means the sampling inspection or auditing of 
work received from another department or company to insure its meet- 
ing specifications as to quality. The same techniques can also be ap- 
plied to testing the quality of work leaving a department or company, 
to insure its meeting certain standards from the point of view of the 
customer or the department receiving it. These techniques are par- 
ticularly well suited to occasional or periodic tests by independent 
auditors. 

Process control means day-to-day routine inspection or verification 
of the result of an operation in the department where it is performed, 
to see that the average quality is up to standard, and that the varia- 
tions in quality are no greater than would be expected from chance 
causes alone. Sampling plans have been found useful for the routine 
verification of clerical work of experienced operators where extreme 
accuracy is not required. 

Control charts and other quality control techniques! are also used 
in setting objective standard levels for the quality of various operations 
and in bringing the actual quality up to these levels. These charts not 
only show the comparison between actual performance and the stand- 
ard level, but they also incorporate control limits, statistically deter- 
mined, to indicate whether or not differences between actual and 
standard performance are significant. 


TYPES OF SAMPLING PROCEDURES 


The commonly used procedures for sampling a population might be 
grouped into the following broad categories: 


A. Judgment sampling. 
B. Systematic sampling. 
C. Random sampling. 





1 Quality control techniques, as applied to factory operations, are described in several textbooks. 
Among the best known is E. L. Grant's book [3] which is intended as a manual for production and in- 
spection supervisors, engineers, and management. 





766 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


In judgment sampling, the selection of sample items is left to the 
judgment of some responsible person or persons familiar with the char- 
acteristics of the population that is being sampled and therefore pre- 
sumed to have sufficient knowledge to select representative items. If 
the characteristic we wish to investigate is closely associated with 
one or more other characteristics, such as geographical location, and 
if the distribution of the population with respect to these other char- 
acteristics is known, the sample is usually selected so that its distribu- 
tion by known characteristics will closely agree with the distribution of 
the population. The sample is then called a quota sample. 

In systematic sampling, all the items in the population are arranged 
in some kind of sequence, if they are not already so arranged, and every 
rth item is drawn into the sample. For example, if we wish our sample 
to include one per cent of the population, we start with some item 
among the first hundred, and then select every hundredth item there- 
after. If the first item is selected at random from among the first r 
items in the population, then every item in the entire population 
will have the same chance (1/r) of being selected. Systematic sampling 
with a random start may be included with random sampling in the 
general class of probability sampling where every item has a known 
probability of being drawn into the sample. 

In random sampling, the actual selection is nearly always determined 
by assigning a serial number to every item in the population to be 
sampled and then employing a table of random numbers. Random 
sampling is also referred to as scientific sampling. When a statistician 
speaks of “sampling” without a qualifying adjective, it is presumed that 
he refers to this kind of sampling. 

There are various subclasses of random sampling. In pure random 
sampling, every selection is made in such manner that all items in the 
population have a chance, and the same chance, of being selected. In 
stratified sampling, the population is first divided into several parts, 
or strata, and each selection is made so that only items in a particular 
stratum have a chance of being drawn into the sample at that particu- 
lar time. In this respect, stratified sampling is like quota sampling. In 
quota sampling, however, the selection within each stratum is left to 
the judgment or convenience of the person making the selection; while 
in stratified sampling, the selection within each stratum is purely ran- 
dom. 

Several telephone companies in the Bell System have made consider- 
able use of a sampling procedure consisting of the selection of random 
subsamples, where each subsample is a sample of the entire population. 





APPLICATION OF SAMPLING PROCEDURES 767 


This general method of sampling, with the number of subsamples 
nearly always equal to 10, has been used by W. E. Deming in sampling 
surveys for Illinois Bell and other Associated Companies to determine 
the per cent condition of telephone property. It was suggested by 
J. W. Tukey of Princeton University and the Bell Telephone Labora- 
tories.” 

This type of sampling is a kind of stratified sampling. To design a 
plan of this kind, in its simplest form, we first assign a serial number to 
every item in the population. A counting interval is then computed 
from the formula 


where N is the highest serial number in the population, 7 is the desired 
number of items in the sample, and k is the required counting interval. 
Fractions in the quotient are rounded off to some adjacent whole num- 
ber. To illustrate, let us suppose that the serial numbers 1, 2, 3, - - - 
are assigned in order to a population consisting of 99,960 items. Sup- 
pose we wish to have 500 items in our sample. Then the counting inter- 
val might be computed as 


, — 10 X 99,960 





= 1,999.2. 


This could be rounded off to the next higher integer, or 2,000. 

The next step is to divide the population into slices, with the first 
k items in the first slice, the second k items in the second slice, and so 
on. In other words, if the counting interval were 2,000, we would include 
items numbered 1 to 2,000 in the first slice, items 2,001 to 4,000 in the 
second slice, etc. 

Next we select 10 sample items from each slice. For the first slice, 
the 10 items are selected by employing a table of random numbers. 
At this point, we may choose either of two further procedures. 

One procedure is to add multiples of k to the serial numbers of the 
sample items in the first slice. Suppose we let a, b, c, - - - , j, represent 
the random serial numbers of the sample items in the first slice. Then 
the sample items in the second slice will be those with serial numbers 
a+k, b+k, c+k, ---,j+k. The sample items in the third slice will be 
those with serial numbers a+2k, b+2k, c+2k, ---+,j+2k. After all 
the sample items have been selected in this manner, they are combined 





® See (2, pp. 96 and 353}. 





768 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


into ten subsamples, the first subsample consisting of items numbered 
a, a+k, a+2k, - ++; the second subsample consisting of items num- 
bered, b, b+k, b+2k, -- +; the tenth consisting of items numbered 
J, J+k, 742k, +--+. As a result, each of the ten subsamples will in- 
clude one sample item from each of the slices into which the popula- 
tion has been divided, except for a possible slight variation in the last 
slice. 

An alternative method, frequently used, is like the foregoing in that 
each subsample contains one item from each slice. But the ten sample 
items for each slice are selected at random, and not by adding multiples 
of k to the serial numbers of the sample items in the first slice. These 
ten sample items are assigned to subsamples in order of random selec- 
tion. 


RELATIVE MERITS OF DIFFERENT SAMPLING PROCEDURES 


The best sampling procedure for a particular problem will usually 
depend on two considerations: 


1. The cost of selecting and inspecting the sample and then analyzing 
the results. 
2. The desired reliability of the results. 


Let us compare the various classes of procedures from these two stand- 
points. 

Judgment sampling is generally speedy, and when an emergency 
arises requiring an immediate answer, we may have no practical alter- 
native to this procedure. Or when we want only a very rough idea about 
some characteristic of a population, the time and added expense of 
carefully designing and selecting a probability sample may not be war- 
ranted. Thus, in the process control of keypunching errors, the selection 
of consecutive items on a judgment basis seems to have important 
practical advantages in many situations. 

The reliability of a judgment sample depends on the judgment and 
skill of the person or persons making the selection, and may therefore 
be very good or very bad. The difficulty is that in most situations there 
is no way to determine objectively just how reliable the sample actually 
is. The reliability is purely a matter of opinion. Because of this uncer- 
tainty, there is frequently a tendency to have the sample include a 
fairly large proportion of the population—say, 10 or 20 per cent—with 
the thought of insuring its reliability. As a result, the sampling pro- 
cedure may turn out to be costly and time-consuming, without actually 
overcoming the inherent uncertainties to any appreciable degree. As 





APPLICATION OF SAMPLING PROCEDURES 769 


mentioned earlier, the fatigue resulting from a large volume of repeti- 
tive operations may also result in errors in observing and summarizing. 
A large sample is not necessarily a good sample; but it is nearly always 
an expensive sample.’ 

In systematic sampling, where the sample consists of every rth item 
in the population, the manner of selection is simple, and easy to explain 
to the average office worker who will do the sampling. It may therefore 
be the most practical procedure where a continuous sample must be 
selected by clerical forces on the basis of written instructions issued by 
the statistician. This procedure is used, for example, in continuous sam- 
ples of telephone toll messages. 

The precision of the result obtained from systematic sampling can 
be determined from the sample data themselves, provided we know 
or can safely assume that there is no relationship between the char- 
acteristic of the population we are trying to investigate and the order 
in which the items in the population are arranged.‘ In many situations, 
however, an assumption of this kind would not be correct. For example, 
if we were to study the telephone usage over customers’ lines by select- 
ing every hundredth line in telephone number order, the result of the 
first number and every subsequent number in the sample ended in the 
digits 00 would be substantially different from the result if the first 
and every subsequent number ended in the digits 99. The assumption 
that there is no relationship between the arrangement of the popula- 
tion and some particular characteristic should be avoided, if possible, 
unless tests have shown this assumption to be reasonably correct. 

Random sampling, when the procedure has been carefully designed 
by a competent statistician, has the unique advantage that the pre- 
cision of the final results can be determined objectively without making 
questionable assumptions.§ In other words, we can compute a precision 
range and state that if all the items in the population were examined, 
the result would fall within that range, with the knowledge that state- 
ments of this kind will be correct practically every time. This knowledge 
rests on the theory of probability, which has been derived mathemati- 
cally and verified again and again through actual experience as a useful 
theory in practical sampling problems. 

Originally, the one principal objection to random sampling was the 





? For further comments, see (2, pp. 9-11]. 

4 The precision can also be computed if we know the relationship between the serial numbers and the 
degree of correlation between the various items in the population; but in many situations, we know even 
less about this relationship than we know about the property we are investigaing by sampling proce- 
dures. A good technical discussion of systematic sampling is given ix (1, Chapter 8]. See also [4 and 5]. 

§ The discussion on “confidence limits” in [1] may be helpful in avoiding questionable assumptions. 
See also [4]. 





770 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1985 


cost of carefully designing a satisfactory procedure and then carrying 
out this design in selecting the sample. This was especially true in pure 
random sampling. This objection has now been largely overcome with- 
out impairing the unique advantage of scientific sampling with respect 
to reliability. The improvements which have been made can be incor- 
porated in a random sampling plan with random subsamples, which 
also has some further advantages peculiar to itself. In the last two or 
three years, there have been several instances where a sample of this 
type has proved to be less expensive than a judgment sample or a 
systematic sample which had been employed in previous surveys of the 
same kind. 


MINIMIZING THE COST OF RANDOM SAMPLING 


The improvements in pure random sampling that have reduced the 
costs of this sampling procedure might be summarized as follows: 
Improved numbering methods. 

Sampling without replacement. 
Stratification. 
Disproportionate sampling. 
Controlled sampling. 
Multistage sampling. 

Cluster sampling. 

Random subsampling. 


1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 


The method of numbering the population sometimes affects the cost 
of selecting the sample items. It is necessary, of course, that serial num- 
bers be assigned so that when a particular number is selected by using a 
table of random numbers, no question can arise as to which item in the 
population corresponds to that number. But it is not necessary that 
every serial number actually be assigned to one of these items. To illus- 
trate, suppose we wish to select a sample of residence telephone cus- 
tomers in Chicago. The most convenient procedure is to take a random 
sample of all the telephone numbers in Chicago, and then discard 
those that are not associated with residence customers. Where the same 
customer has more than one telephone number, we also discard auxiliary 
numbers not associated with the main service so as to give this cus- 
tomer the same chance of being selected as a customer with just one 
telephone number. If this method of numbering and selecting is fol- 
lowed, however, it is necessary to observe the precaution of making no 
substitutions to replace serial numbers that are not associated with 
items in the population in which we are interested; otherwise, the prob- 
ability of selection is likely to vary from item to item. 





APPLICATION OF SAMPLING PROCEDURES 771 


The method of numbering usually creates no special difficulty in 
determining the sample size. If we wish to have the sample include one 
per cent of the residence telephone numbers, for example, we design 
our sample so as to include one per cent of all the telephone numbers. 
The number of residence telephones actually included in the sample 
will then be approximately one per cent of the total residence tele- 
phones. Any slight discrepancy between the actual size of the sample 
and the expected size is usually not important. In exceptional situa- 
tions, where a minimum sample size is necessary, we can oversample 
by about twice the square root of the desired sample size, and then 
discard any excess items on a random basis. 

Sampling without replacement, instead of sampling with replacement, 
is always employed. In other words, we avoid duplications in selecting 
the sample. Formulas are available to determine the effect on the pre- 
cision of the over-all mean of the sample.® 

Stratification is important if various strata in the population differ 
considerably from one another in their average values or variability, 
or in the unit cost of selecting and processing. When the population is 
divided into slices as previously described, the number of strata is equal 
to the number of slices, and is therefore large in most situations. To 
take full advantage of the greater precision which this makes possible, 
the method of numbering the items in the population should be such 
that similar items tend to be adjacent to one another. For example, 
in numbering telephone customers for a sampling survey of some char- 
acteristic that may be closely associated with the size of the com- 
munity, the communities should first be arranged in order of size before 
assigning serial numbers to subscribers in those communities. 

Disproportionate sampling is usually advisable if there is a consider- 
able difference between strata with respect to the variability of the 
characteristic we are investigating, or with respect to the cost of in- 
specting the items to measure this characteristic or the importance of 
the items in the final conclusions. Formulas are available to determine 
the most economical way to apportion the sample over these various 
strata.” In some cases, it may be desirable to inspect all the items in 
some unusual stratum. For example, in sampling telephone usage per 
customer, it is usually desirable to summarize the data 100 per cent 
for customers with exceptionally high usage. 

In controlled sampling, the size of the sample for each stratum or 
class of items in the population, as well as the size of the total sample, 





* See discussion of the “finite population correction” in various places in [1, 5). 
‘See remarks on “optimum allocation” in 1, 2, 4, and 5). 





772 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


is rigidly specified in advance. When such specification is made on the 
basis of exact knowledge of the total number of items in each class, the 
precision of the sample results will be greater than when the distribu- 
tion of sample items among different classes is left to chance. On the 
other hand, there are two advantages in not rigidly specifying the 
sample sizes by classes of items: 


1, If the number of sample items for each class is not rigidly speci- 
fied, the actual distribution of the sample items may be compared 
with the known distribution of the population as a check on the 
sampling procedure. Some such check is always desirable. 

2. There are sometimes errors in records purporting to show the 
actual numbers of items for various classes. By not specifying the 
sample sizes by classes in advance, the assumption that the rec- 
ords are correct is avoided. Moreover, the existence of actual 
errors in the records may sometimes be disclosed. 


It should also be mentioned that when sampling by the use of count- 
ing intervals is employed, the greater precision resulting from specify- 
ing the sample sizes by classes will usually be of negligible importance, 
provided the population has first been arranged according to these 
classes before serial numbers are assigned. 

While controlled sampling is practically never desirable, it is never- 
theless often desirable to vary the sample proportions between classes. 
This variation is sometimes most easily accomplished by varying the 
counting interval from class to class so that the slices of the popula- 
tion are narrower for classes where a higher sample proportion is indi- 
cated. Another procedure is to oversample, using the same counting 
interval for all classes, and then subsequently discard a fixed proportion 
of certain classes on a random basis. In a sample of telephone poles, for 
example, the sample poles were divided between those with cable at- 
tachments and those without, and only 25 per cent of those without 
cable attachments were actually inspected. Thinning the sample in this 
way may be employed when there is little or no information before- 
hand as to the relative sizes of various classes, or where it is difficult 
to separate the population into different classes before assigning serial 
numbers. 

Multistage sampling, as opposed to single-stage sampling, is another 
device for reducing the cost of selecting the sample and making the 
subsequent inspections. In designing a sample of dwelling units in 
Chicago, the procedure was, first, to select sample blocks, and then to 
select a sample of tue “dwelling locations” in each block, a dwelling 











APPLICATION OF SAMPLING PROCEDURES 773 


location being defined so as to include from one to six dwelling units. 
For example, a dwelling location might be a single dwelling house, or a 
three-flat building, or all the apartments at a particular street address. 
All units at a particular dwelling location were included in the sample. 
The procedure in this particular case would be called two-stage sam- 
pling, the first stage being the selection of the block, and the second 
stage the selection of the dwelling locations within the block. Since 
each dwelling location comprised a group or cluster of one to six dwelling 
units, the procedure followed in this particular case might also be called 
cluster sampling. 

The principal advantage of cluster sampling and various multistage 
procedures in field surveys is the saving in travel time. A considerable 
saving may also be effected at times in sampling internal records, if the 
procedure is such that several items are inspected in each sample drawer 
or file. There are some disadvantages, however, that need to be con- 
sidered : 


1. Where only the first sampling stage is under the direct supervision 
of the statistician, and subsequent stages are to be carried out on 
the basis of verbal or written instructions, there is always the 
possibility that the instructions will not be understood, or that 
they will be departed from more or less on the assumption that 
this is an area where “judgment” may be exercised. 

. With certain exceptions, the results of multistage sampling or 
cluster sampling are more or less biased. This bias is likely to be 
important if the number of sample units at the first stage is small, 
and if at some stage there is a high degree of correlation, positive 
or negative, between the sample size and the characteristic we are 
trying to measure. 

. The measurement of the precision of the sample results may be 
difficult. 

. The averages of individual sample clusters are likely to be mis- 
interpreted. In opinion surveys, for example, if the returns for 
some particular community consist of 10 questionnaires and all are 
unfavorable, there will be an intuitive tendency to regard the 
result as significantly bad, even though just one cluster was 
involved. 


It should also be pointed out that if there is a high degree of correla- 
tion within clusters, the sample size required for a given degree of pre- 
cision with multistage sampling may be considerably greater than for 
single-stage sampling. For this reason the expense of collecting and sum- 





774 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1095; 


marizing the additional information that is necessary may largely or 
altogether offset the saving in travel time. 

Multistage sampling, or cluster sampling, is sometimes necessary: 
but the possible disadvantages should be kept in mind. 

Random subsamples, which constitute an important feature of many 
plans now in use in the Bell System, save much time in analyzing the 
sample results. The results are first summarized for each of the sub- 
samples, and the subsample average is computed. The overall average 
for the entire sample is the average of the subsample averages. To com- 
pute the precision range when 10 subsamples are selected, it is necessary 
only to add the squares of 10 differences, divide by 10, and extract the 
square root.* Random subsamples are also useful in testing for possible 
bias in multistage sampling. °® 


CONCLUSIONS 


The following is quoted from a manual on sampling issued by the 
American Telephone and Telegraph Company: 


“Properly used, sampling procedures are among the most powerful and 
valuable tools at the statistician’s command. Improperly or carelessly used, 
they are among the most dangerous. Particularly careful attention should 
be paid to the selection of the sample. Too often, perhaps, insufficient con- 
sideration has been given to the matter of selection, with the result that 
badly biased samples have been put forth as reliable simply because the 
samples were large. This point cannot be overstressed. A sample can never 
be better than its method of selection.” 


REFERENCES 


[1] Cochran, William G., Sampling Techniques. New York: John Wiley & Sons, 
1953. 

[2] Deming, William Edwards, Some Theory of Sampling. New York: John Wiley 
& Sons, 1950. 

[3] Grant, Eugene L., Statistical Quality Control. Second Edition, New York: 
McGraw-Hill, 1952. 

[4] Hansen, Morris H., Hurwitz, William N., and Madow, William G., Sample 
Survey Methods and Theory. New York: John Wiley & Sons, 1953. 

[5] Yates, Frank, Sampling Methods for Censuses and Surveys, Second Edition. 
London: Charles Griffin & Company, 1953. 


8 The precision range, computed as suggested here, is equal to 3 times the estimated standard de- 
viation of the overall sample mean. This measure of precision is convenient in many applications of 
sampling. The measure itself can be made more precise if independent samples are selected for severs! 
groups of items and the sample means and variances are then combined by employing Cochran's [1] 
formulas (5.1) and (5.7), pages 66 and 69. 

® The sample average is likely to have some degree of bios unless the number of units to be selected 
is predetermined for each stage of the sampling procedure, Otherwise, the sample average is the ratio of 
two random variables: the sum of the sample mensurements and the number of units selected. Where 
random subsampling is employed, this bias can conveniently be investigated by methods suggested in 
[1] and [4] for finding the bias of a ratio-estimate. 








R 1955 


‘ly or 
Sary; 


many 
g the 
sub- 
erage 
com- 
ssary 
t the 
sible 


Sons, 
Viley 
‘ork: 
imple 


tion. 


rd de- 
ons of 
evers! 
n’s [1] 


lected 
tio of 
Vhere 
ted in 


ESTIMATION OF THE BRAZILIAN COFFEE HARVEST 
BY SAMPLING SURVEY 


W. L. Stevens 
Universidade de Séo Paulo 


This article describes the results of an investigation, carried 
out on behalf of the Instituto Brasileiro do Café, into the possi- 
bilities of estimating the Brazilian coffee crop by means of a 
permanent sampling survey. 

The state of Sio Paulo already has a general sampling sur- 
vey for the principal crops. With some modification, this 
would furnish estimates with adequate precision for coffee. 
In the states of Minas Gerais, Rio de Janeiro and Parané, 
there exist registers of property owners suitable for use as 
frames. From them could be constructed indexes of coffee 
growers, so that the primary sample would be drawn from this 
index, with a supplementary sample from the remainder. From 
data of the pilot survey, an estimate is made of the volume of 
work which would be needed. 

The state of Parandé has recently been surveyed by aerial 
photography. It was found that this photograph provides a 
very satisfactory frame for sampling. The state of Espfrito 
Santo possesses no register of property owners, and it is sug- 
gested that the sampling survey in that state will have to 
await a survey by aerial photography. 


1, BACKGROUND 


ow to make reliable estimates of the coffee crop is an urgent prob- 
lem for Brazil, because Brazilian coffee export is about 50 per cent 
of world export and about 70 per cent of all Brazilian exports. The 
problem indeed is more than economic: friendly relations with the 
United States were recently strained by rocketing retail prices which 
were, in part, the product of incorrect estimates of the effects of frost. 

In only one state of the Federation—in the state of Sio Paulo—is 
there in existence a permanent sample survey for the principal agricul- 
tural products. In all other states estimates of harvests are of a purely 
subjective character. 

Early in 1954, the writer was asked by Jofio Pacheco e Chaves—then 
Director of the Instituto Brasileiro do Café—to advise on methods of 
sampling survey for coffee crop estimation in the principal states pro- 
ducing coffee for export (see Table 1). 

As no information was available—outside Sio Paulo—on which to 
base recommendations, the writer replied that it would be necessary to 
make an investiagtion in the field, which was accordingly undertaken 


775 





776 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


TABLE 1 
COFFEE EXPORTING STATES: AREAS AND EXPORT, 1953 








Million 
State Thousand Sacks Of 


60 Kg. 





ee 


S&o Paulo 
Parand 

Minas Gerais 
Espirito Santo 
Rio de Janeiro 
Others 





Total 15.1 





during 1954, and paid for by the Instituto Brasileiro do Café. The 
detailed report of the investigation was presented in October, 1954. 
This article discusses only aspects of general interest and it must be 
understood that the views expressed are purely those of the writer. 


2. GENERAL CONSIDERATIONS 


In a sampling survey for coffee—or indeed any crop for which the in- 
formation is supplied by the farmer—the elementary sampling unit is 
necessarily the farm. True area sampling is not practicable because 
boundaries of areas marked on a map will usually cut through farms 
and it is not generally possible to obtain separately, data for the portion 
of the farm lying inside the marked area. 

Since the farm is the ultimate sampling unit, the most useful frame 
would be an index of farms or more generally, an index of rural proper- 
ties, since, in a country like Brazil, there are many properties not yet 
used for farming. A true index of rural properties identifies each prop- 
erty on a large scale map showing boundaries of all properties. Such 
an index possesses the enormous advantage, for sampling purposes, of 
being, by its very nature, free from omissions and duplication. No such 
index exists anywhere in Brazil. 

As an alternative, one may consider using, as a frame, an index of 
property owners. This is by no means the same as an index of properties, 
since its freedom from omissions and duplications depends entirely up- 
on the assiduity and competence of the staff which maintains the in- 
dex. There may be omissions, when an owner fails to register his prop- 
erty. There may also be duplications, when ownership of a property is 
in dispute and both disputants make a registration or, again, if on the 





gsTIMATION OF BRAZILIAN COFFEE HARVEST 777 


oecasion of transference of ownership, an entry for the new owner is 
added without the entry of the old owner being deleted. : 

An index of owners is at a further disadvantage in comparison with 
an index of properties based on a topographical survey. In the former, 
{there is much delay in registering a change of ownership, it becomes 
dificult to identify the property; in the latter, it is not even necessary 
to record the owner, although it may be convenient to do so. Again, 
changes in area of a property arising from sale of part of the property 
or purchase of a part or the whole of a neighboring property are more 
dificult to recognize if we use an index of owners than if we have an 
index of properties based on a map. 

This means that, if we propose to use as a frame an index of rural 
property owners, we must satisfy ourselves, not only that it is reason- 
ably free from omissions and duplications, but also that it is being kept 
up-to-date. 

Indexes of property owners will be found whenever there is a tax 
on rural properties. Of the five coffee-producing states, all except 
Espirito Santo have a rural property tax levied by the state govern- 
ment. In the state of Paran4, there is also a tax on properties, levied 
by the counties (munictpios)—or at least by many of the counties. 
Accordingly, there exist in that state two independent indexes of prop- 
erty owners. 

Although a satisfactory general sampling survey may be based on an 
index showing nothing more than total area of the property—and in 
fact that is the basis for the general survey in the state of Sio Paulo— 
the survey for coffee will not be efficient unless we have supplementary 
information on that crop. Consequently, we must investigate the possi- 
bility of supplementing the general index of property owners with 
information of the number of coffee trees or of the production in pre- 
vious years. 

Instead of an index, one may use a map as a frame for sampling 
properties. One needs however a very good map on a large scale, be- 
cause it is necessary to: 


(i) identify, on the ground, any point or area marked on the map, 
(ii) draw on the map the boundaries of any farm visited, or 
(iii) as an alternative to (ii), locate on the map a definite point of 
the farm, such as its most northerly point or the position of the 
farmhouse. 


No maps satisfying these requirements exist in Brazil. 
What does exist however, in the state of Parand, is something even 





778 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 194 


more useful—an aerial photograph of the whole state, completed jp 
1953, on the nominal scale of 1/25,000 (varying, in fact, from 1 jp 
24,000 to 26,000). On this photograph, one can, without difficulty, 
recognize coffee plantations. 

Although it would be possible to use an index of property owners a 
a frame for sampling in Parand, it was clearly of very great interest to 
see what could be done with an aerial photograph, not only for use ip 
Paran4, but also as possibly the only practicable solution in Espfrito 
Santo, where, as has been noted, no index of property owners exists. 


3. STATE OF SAO PAULO 


As a general sampling survey for the principal agricultural products 
has been in operation in Sio Paulo since 1951, the question here was 
not to organize an independent survey, but to ensure that the general 
survey would yield estimates of sufficient precision for coffee. 

The frame used is an index of rural property owners, which is organ- 
ized by the “Home Secretariat” (Secretaria da Fazenda) for the purpose 
of levying a tax on rural properties. The Secretariat’s county tax 
offices prepare each year, a typewritten alphabetical list of property 
owners. A copy of the list is sent to the Secretariat of Agriculture where 
it is reproduced on Hollerith cards. This Hollerith file is the population 
from which the sample is drawn. The information supplied is name of 
owner, total area, annual tax (not used) and address. 

Where the property is large, its name is a sufficient identification. 
Small properties often have no name and their addresses are often 
less complete than they should be. The Home Secretariat has however 
been asked to instruct county officers to give adequate addresses. 
Even when the address of a sampled farm is insufficient for identifica- 
tion by the field worker, he can almost always make the identification 
by consulting the local tax office. 

The original forms, filled in by the owners, contain further informa- 
tion on the use of the land. This information is unreliable and is not 
used in the survey. Small discrepancies in total area are common, but 
the few gross errors have been due to mistakes in transcription. 

With regard to the possibility of omissions, the a priori argument, 
if unscientific, is very strong. Registration of the property and regular 
payment of taxes is, to some extent, evidence of title. It is inconceivable 
that the owner of a valuable farm will try to keep it out of the register 
in order to evade a relatively small tax which, in any event, will come 
home to roost with compound interest added, when the farm is sold or 
transmitted to an heir. Doubtless there are forgotten and abandoned 





ESTIMATION OF BRAZILIAN COFFEE HARVEST 779 


properties not on the register, but, as these are unproductive, their 
omission will not affect estimates of production. 

On the contrary, one worries more about duplication. It is known that 
the same piece of land is sometimes registered by two or more claimants. 
This can happen only in undeveloped or recently developed counties, 
and even there, the county tax officer is usually aware of the situation. 
Whatever the amount of duplication may be—and it is very small—it 
will necessarily tend to lessen, as new areas are brought into agricul- 
tural use and disputes of ownership thereby brought to light. 

Changes of ownership are normally reported to the tax office by the 
new owner. However, even if he omits to do this, a copy of the registry 
of sale will, in due course, be sent in by the notary who registered the 
sale. 

An objective check on omissions is provided by listing the owners 
of all properties contiguous with some or all of the sampled properties 
and subsequently trying to identify these in the register. The method 
has yet to be used extensively in Sao Paulo, but it was used successfully 
in the pilot survey in other states. If duplications exist one will, sooner 
or later, include the same property more than once in the sample. This 
has never yet happened. 

The chief source of error is failure to establish a “one-one correspond- 
ence” between properties in the register and properties in the field. 
A typical case is when a farm is divided formally between the heirs 
but continues in operation as a single unit. It is for this reason, that the 
total area is an important means of identification. The field-workers 
are instructed to investigate any large discrepancy between area in 
register and area declared by the farmer on the occasion of the visit. 

Experience over several years has shown that the register of property 
owners in the state of Sao Paulo is a satisfactory, if not perfect, frame of 
reference. 

The sample, of 1500 farms, is stratified geographically and by total 
area of property, with class limits, in hectares— 


3—10—30—100—300—1000, etc. 


The work of visiting the farms is undertaken by the officers of the 
agricultural advisory service, each of whom has charge of a “region” 
consisting of from one to five counties. With the intention of spreading 
the work more or less equitably, the sample is drawn so that a region 
contributes, on the average 10, and never more than 14, farms. This is 
not an entirely satisfactory method, because it happens that some parts 
of the state which have recently greatly expanded production of coffee 





780 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


and other crops have not received a proportionate increase in numbers 
of regional officers, with the result that they are under-represented in 
the sample. 

The sampling error for the coffee estimates is about 9 per cent, which 
is too high to be acceptable. 

The Secretariat of Agriculture has, however, begun to organize an 
index of coffee growers. This will show the number of trees and it can 
be used as supplementary information for improving the precision of 
the coffee estimates. If this improvement is not enough, it may become 
necessary to increase the intensity of sampling in the coffee-growing 
regions, even if this means that the additional work will have to be 
done by independent observers. 

It may be noted that the forecasts of harvest are based entirely on 
the forecasts made by the farmers. There is no possibility, at present, of 
making more objective forecasts by sampling crops in the field. The 
regional officers have neither the time nor the training for such work. 
In the course of time, the analysis of data will make possible a calibra- 
tion of farmers’ forecasts in order to remove biases. 

As the lines of development for obtaining satisfactory forecasts and 
estimates in the state of Sado Paulo are sufficiently clear, we may now 
turn to the other states. 


4. REGISTERS OF PROPERTY OWNERS 


The states of Minas Gerais, Rio de Janeiro and Parané all possess 
registers of rural property owners, organized for the levying of state 
taxes. In the first two, the registers are maintained in the taxation 
offices located in each county, while in the last the organization is 
centralized in the state capital. An objective check was made on the 
quality of these registers by listing all neighbors of farms visited in the 
pilot survey and trying subsequently to identify these in the registers. 

In the first two states the conclusion was reached that the registers 
are sufficiently up-to-date and free from omissions and duplications to 
serve as frames. In Parand the register is not quite as satisfactory. 
There appear to be delays in recording changes of ownership, perhaps 
as a result of the centralization of the work. There are also many spell- 
ing errors in names, which can be a nuisance in a file kept in alphabetical 
order. No doubt these minor deficiencies could be overcome. 

Fortunately, however, in the state of Paran&, the prefeituras (county 
governments) levy their own tax on rural properties, with the result that 
a card index or register of properties, entirely independent of the state 
register, can be found in the county halls (or at any rate, were found in 
the half-dozen counties visited in the investigation). It was found that 





ESTIMATION OF BRAZILIAN COFFEE HARVEST 781 


these county registers are in very good order and are therefore suitable 
for use as frames. 

The state registers all give total area of property. Information about 
coffee either is not given or, where given, is quite unreliable. It was 
therefore necessary to investigate the possibility of obtaining this in- 
formation. 

The state of Minas Gerais, in addition to the tax on properties, 
levies a tax payable directly by the grower, on sale of coffee. For fiscal 
purposes, the state Finance Secretariat has instructed the local offices 
to set up and maintain a card index of coffee growers. The blank card 
provided for this purpose shows that the information sought is name 
of owner, address of farm, number of trees in production, number not 
yet producing, annual harvest and individual sales. 

It is evident that such an index would be of great value for sampling. 
Unfortunately, it is not yet in proper working order. In some counties, 
the cards had not even arrived. At the best, the information was 
deficient, because the name appearing on the card was often not that of 
the owner of the farm where the coffee was grown, but of someone 
else—a share-cropper who grew it, a neighboring farmer who threshed 
and sold it, or even a merchant who was not a farmer at all. For reasons 
which will appear, it is essential that each coffee grower in the index 
of growers should be identified with a property owner in the general 
register. However, the index of growers has only just been started and 
there should be no difficulty in getting it properly organized if the local 
officers are given clear instructions. 

In the state of Rio de Janeiro, no index of growers exists. An experi- 
ment was therefore made, in two counties, to see how one could be 
organized. A group of leading coffee farmers and merchants was called 
together in the local tax office and names were read to them from the 
general register. It was found possible in most cases not only to ascertain 
whether the farm had coffee but also to obtain an approximate idea of 
the number of trees. It is evident that an index constructed by such a 
method may contain numerous omissions but, as will appear later, 
these will not invalidate the estimates. 

In the state of Parand, some counties show number of trees on their 
card index and some do not. Where the information is not available, 
the method tested in Rio de Janiero could doubtless be used. 


5. SAMPLING METHODS 


Assuming that a card-index of growers has been set up, we may now 
inquire how it can be used for sampling purposes. The supplementary 
information can be number of trees, previous year’s harvest or average 





782 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


of a series of years. At first one might think that previous harvest 
would be more useful than number of trees but an analysis of the data 
from the pilot survey did not confirm this. In fact, coffee, like other 
perennial crops, shows strong fluctuations from one year to the next. 
Possibly an average of several years’ harvest would be better; there was 
hardly sufficient data to examine the point. There are also very strong 
reasons of a practical nature for using number of trees rather than 
previous harvest. It is usually easier to obtain reliable information on 
number of trees. For example, a farmer can often say how many trees 
his neighbors have, when he does not know, or does not like to reveal | 
their last year’s harvest. Moreover, number of trees is a fairly stable 
datum so that it would not matter seriously if the information were not 
rigorously brought up-to-date every year. It therefore appeared of 
interest to see what could be done, using number of trees as supplemen- 
tary information. 
Two methods of sampling are available: 


(i) selection with probability proportional to number of trees, 
(ii) stratification by number of trees, with variable sampling fraction. 


The former method, involving as it does a large number of division 
sums, is not one which recommends itself to the writer. Let us pass to 
the second. 

Some 190 farms were visited in the pilot survey. In most cases infor- 
mation was obtained of harvests for three years, that of the last year 
being often a forecast. As the harvests vary greatly from year to year, 
the data are practically equivalent to that which would be obtained 
from three independent samples. The sample was supplemented by some 
30 farms in the state of So Paulo. 

The sample was stratified by number of thousand trees in production 
with the following class limits 


1—3—10—30—100—300—1000 


It was further stratified geographically according to the amount of 
data available. It was then found that the coefficient of variation inside 
substrata was stable, with an average value of 0.674. Assuming that 
the sampling fractions are proportional to the standard deviations, we 
therefore arrive at the following number of farms for a given sampling 
standard error in the estimate: 








Standard Error Number 





10% 50 
5 180 
3 500 








ESTIMATION OF BRAZILIAN COFFEE HARVEST 783 


This, of course, takes no account of the effect of errors in the statement 
of numbers of trees. Such error will tend to increase the standard errors 
of the estimates. 

On the other hand, it can be shown that the classification used is 
F coarse enough to increase appreciably the standard error. It would be 
advisable to use the slightly finer classification— 


1—2—5—10—20—50—100—200—500—1000 


It must not be supposed that the index of growers will be complete; 
= indeed there may be a considerable number of omissions. In any case, 
even if it were complete, one would still want tc estimate the number of 
new trees, for which purpose one must sample farms not yet producing. 

It is therefore necessary to sample “non-growers,” i.e., all rural 
property owners left in the general register after eliminating those 
which appear in the index of growers. For these, a sample stratified by 
total area may conveniently be taken. The number required will de- 
pend of course on the completeness of the index of growers. It was sug- 
gested that, in the beginning, a sample of “non-growers” at least as 
big as the sample of growers should be drawn. It is noted however, 
that the field work is not proportionately increased because those 
which, in fact, do not carry coffee are visited only once. 


6. SAMPLING FROM THE AERIAL PHOTOGRAPH 


In view of the great possibilities offered by the excellent aerial sur- 
vey of Parand, it is unfortunate that the writer succeeded in obtaining 
the photograph of only one county. The investigation therefore does 
not have the generality which one would like. 

The simplest way of using an aerial photograph is to select points at 
random or systematically and visit the farms within which the points 
fall. The boundaries of the farms are then drawn on the photograph. 
For each farm in the sample, is calculated the ratio: (production) /(area 
inthe photograph). The average of these ratios, multiplied by the total 
area of the photograph, then gives the estimate of production of the 
region covered by the photograph. 

Using this method, it was found that the ratio has a coefficient of 
variation of 1.19, the number of farms needed in the sample being 
accordingly— 








Standard Error Number 





10% 140 
5 560 
3 1570 














784 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 14 





This method is not efficient, since it makes no use of the fact that th 
coffee plantations can be recognized in the photograph. 

As an alternative, we can take points at random and select the fam 
only if the point falls inside a coffee plantation. The denominator ¢ 
the ratio is then the area of coffee plantations (in the photograph) 
belonging to the farm and the final multiplier is the total area of coffe 
plantations in the photograph. 

Using this method, it was found that the standard error is the sam 
as for a sample drawn from an index of coffee growers stratified by 
numbers of trees. This is not surprising, since the supplementary in. 
formation is of the same nature—area of plantations in photograph 
instead of number of trees. 































The method would, however, after a few years, cease to provide P 
valid estimates because it excludes from the sample, farms which § - 
had no coffee when the photograph was taken but which are subse- we 
quently planted with coffee. In order to bring these into the sample, it = 
would be necessary to use a combination of the two methods, for ex- vis 
ample: place a network of points on the photograph and choose all asl 
which fall in coffee plantations and, say, one-tenth of those which do i 
not. Then the difficulty is that there does not appear to be any efficient y 
way of making estimates which does not necessitate complicated cal- ° 
culations. , 

How, then, should the photograph be used? One could make an index = 
of all rural properties with boundaries marked on the photograph, but a 
this would take many years. The solution which appears most satis- 
factory to the writer is to mark the boundaries of the very large farms " 
and join them by lines following the boundaries of smaller properties, , 
without, however, troubling to survey all the smaller properties. This : 
would divide the whole photograph into areas to serve as sampling F 





units. These would be stratified by area of coffee plantations in the 
photograph. In the course of time, this information would be substi- 
tuted by number of trees, so that by the time the photograph had be- 
come obsolete with respect to its information about coffee, this informa- 
tion would no longer be used. Since the substitution can take place 
over many years—except perhaps in new regions which are being 
opened up very rapidly—the collecting of the information on number 
of trees can be undertaken during the routine of field work. It will 
become available in any case for areas in each year’s sample, and to 
these can be added information of all areas contiguous to the sampled 
areas. In addition, in the course of time, areas containing many prop- 
erties can be progressively broken down. 









~~ ee -. hUeltlC a 











ry in- 
graph 


rOvide 
which 
su bse. 
ple, it 
or ey- 
sé all 
ch do 
cient 
1 cal- 


ndex 
, but 
atis- 
arms 
ties, 
This 
ling 
the 
»sti- 


ESTIMATION OF BRAZILIAN COFFEE HARVEST 785 


By these methods, it would be possible to start the sampling survey 
fairly soon and to ensure that it continues to operate when the photo- 
graph no longer provides valid information on area with coffee. 

As to the practicability of the work, the investigation did show 


that— 


(i) it is not difficult to reach, on the ground, any point marked on 
the photograph 

(ii) it is not difficult, though it requires care, to draw on the photo- 
graph the boundaries of any farm visited. 


7. ESP{RITO SAN'’O 


There is no register of rural property owners in the state of Espfrito 
Santo. To set up and maintain such a register would be too costly if 
its only purpose were for coffee estimation, though perhaps it could be 
considered if the state government decided to make surveys of all 
principal crops. One may ask whether it would be possible to con- 
struct an index of coffee-growers. The problem here would be very 
much more difficult than in the other states because, in the absence 
of a general register, it would be necessary to ensure an absolutely 
complete index of growers. 

An experiment was made to see whether such an index could be 
constructed from information contained in the commercial books of 
coffee buyers. The result was certainly not encouraging. It would en- 
tail a great amount of work with no guarantee that it would yield a 
complete index. Moreover, as it is not practical politics legally to com- 
pel the buyers to open their books to inspection, an uncooperative 
attitude of one or two would ruin the plan. 

One can also devise schemes for compelling the coffee grower to 
register his property, but it is doubtful whether they would prove 
successful in practice. 

The writer came to the conclusion therefore, that the problem will 
only be solved satisfactorily by first making a survey by aerial photog- 
raphy. This would cost about $50,000 but there is no reason why the 
whole of this sum should be charged to the coffee survey. The photo- 
graphs would be used also for making maps: maps now available are 
very poor. 


8. OBJECTIVE FORECASTING AND INTERNAL CONSUMPTION 


No suggestion is made, as yet, to make forecasts by sampling the 
crop in the field. To do this it will be necessary to carry out sampling 





786 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 


experiments over several years at different centers, in order to perfect 
a method. Subsequently, it will be necessary to train the observers 
because estimates will be biased unless the sampling plan is rigorously 
executed. 

Sampling of the crop in the field, even if restricted to a sub-sample 
of the farms, would greatly increase the work. It is not obvious that it 
would provide forecasts superior to those made by the farmer, pro- 
vided these are calibrated, as will be possible when data for severe] 
years become available. 

In order to estimate coffee available for export, it will be be neces- 
sary to subtract the internal consumption, about which nothing is 
yet known. Plans are now being made for a sampling survey to esti- 
mate internal consumption. 


9. PLAN OF WORK 


In order to estimate the cost of the survey, it is necessary to know 
what standard error can be tolerated. This is a job for the economist 
rather than the statistician but, in the absence of any definite pro- 
nouncement, the writer suggests that it would be reasonable to aim at 
a standard error of 4 per cent for the three separate estimates of Sio 
Paulo, Minas Gerais and Rio de Janeiro taken together, and Parané, 
and a standard error of 6 per cent for Espirito Santo. This would give 
a standard error of 2.3 per cent in the total. 

For Minas Gerais and Rio de Janeiro together one may therefore 
suggest a sample of 320 coffee growers and a sample of 320 “non- 
growers.” Assuming that the index of growers is fairly complete, the 
number of farms to be visited in the second and subsequent surveys 
would be not much greater than 320. The opinion here is in favor of 
a series of separate surveys, each survey to take not more than 3 
weeks, rather than a continuous survey all through the season. The 
pilot survey showed that an average of 4 visits per day should be easily 
achieved, once the farms have been located. With 4 jeeps, 320 farms 
would therefore be visited in 20 days. The writer expressed the opinion 
that each jeep should contain 2 persons, not only to ease the strain 
of driving over long distances but also to keep the work going if one 
person is ill. However, the counter suggestion is for 5 jeeps each with 
only one person. 

The first survey of each year would take much longer. In each 
county, it would begin with a revision of the index of growers by com- 
parison with the register in the state tax office. The two samples would 
then be drawn systematically within classes of number of trees and 





BER 1985 


Perfect 
servers 
orously 


sample 
that it 
T, pro- 
several 


neces- 
ing is 
O esti- 


ESTIMATION OF BRAZILIAN COFFEE HARVEST 787 


total area respectively. These farms would then have to be located and 
visited. Experience in the pilot survey showed conclusively that on 
the first visit the observer must be introduced to the farmer by a local 

rson who commands the farmer’s confidence: if not, the information 
supplied will often be false. The first survey may therefore extend over 
a couple of months and would be carried out during the flowering sea- 
son September to November. While some sort of forecast could per- 
haps be make, the information sought would be mainly of age distribu- 
tion of trees, methods of cultivation, equipment, etc. 

Forecasts and estimates would be made in three subsequent surveys, 
before, during and immediately after the harvest. These might have 
to be supplemented by extra surveys after unexpected climatic con- 
ditions, such as a frost. 

The amount of work for the state of Paran&é would be about the 
same—perhaps less, as the coffee growing is more concentrated. There 
would however be additional work in the first year in marking out 
areas on the photographs, if that method were used. 


10. CONCLUSIONS 
The result of the investigation may now be summarized: 


(i) Estimates in the state of Sao Paulo can be obtained through the 
present general survey, using an index of growers for supplementary 
information and probably increasing the sample in the coffee growing 
regions. 

(ii) A sampling survey can be established in Minas Gerais and Rio 
de Janeiro, using as frames an index of coffee growers and the tax 
registers. The index of coffee growers would take only a few months 
to prepare but might prove unsatisfactory in the first year. It would 
however become adequate in a year or two. 

(iii) The aerial photograph of the state of Parandé can be used as a 
frame for a sampling survey. A year or maybe more would have to 
be spent on mounting the photographs and marking areas. 

(iv) There is no way immediately available for establishing a satis- 
factory sampling survey in Espirito Santo. The problem will be solved 
if interested parties can be persuaded to make a survey of the state 
by aerial photography. 

(v) The cost of a permanent sampling survey would be small in 
comparison with the great losses suffered by the farmers as a result 
of the inadequacy of the present methods of estimating and fore- 
casting. 





ON THE RELIABILITY OF RESPONSES SECURED IN 
SAMPLE SURVEYS* 


RoBERT FERBER 
University of Illinois 


HE reliability with which information is obtained on sample surveys 

has long been a subject of widespread interest. The ways in which 

y such information may be biased border on the infinite,’ and it is well J 

recognized—though perhaps not always fully appreciated—that the 

sampling variation in sample-derived estimates of population character- 

istics may be negligible relative to the biases introduced at other stages 

in the operation. As a result of this recognition, the literature on the 

detection and avoidance of bias in the sampling operation has grown 
tremendously in recent years.? 

This study represents, it is hoped, a new contribution to this litera- 
ture, focussing on an aspect of the subject which, although perhaps one 
of the greatest transgressors of them all, appears to have received little 
attention. This is the variability in response of different family men- 
bers to the same question. In other words, how consistent are the re- 
plies of various family members on questions relating to different family 
characteristics—income, family size, purchases, planned purchases, 
and others? As will be shown in this paper, the consistency on many 
questions is very limited, and accordingly serious consideration has to 
be given to the selection of the respondent, as well as of the family or 
spending unit, in the great number of economic and marketing surveys 
designed to elicit family data from one member. Suggestions of various 
means of handling this problem are offered in the final section of this 
paper, following the presentation of the main results of the study. 


THE STUDY 


In the latter part of 1951 and the first half of 1952 a randomly 
selected sample of families in Champaign-Urbana were interviewed 
in such a way that answers of adult members of the family to the same 





* This study was made possible by aid furnished by the Bureau of Economic and Business Research 
of the University of Illinois. The author would also like to acknowledge the capable assistance pro- 
vided by Phyllis Barnard, Frances Dotson, Jean Robbins and Raymond Twery at various stages of the 
operation. 

1 As any attempted breakdown of the main sources of bias readily reveals. See Deming, W. E., “On 
errorsin surveys,” American Sociological Review, 9 (1944), 359-69. 

2 As is evident from a perusal of bibliographies on the subject in such sources as Parten, Mildred, 
Surveys, Polls, and Samples: Practical Procedures, New York: Harper and Brothers, 1950, and Ferber, 
Robert, Statistical Techniques in Market Research, New York: McGraw-Hill Book Company, 1949. 


788 





RESPONSES SECURED IN SAMPLE SURVEYS 789 


question were obtained separately and simultaneously. The manner 
in which this was accomplished involved three steps: 

1. A personal letter to the family explaining the study as an experi- 
mental one designed to test the reliability of consurmer surveys 
and through this knowledge to help eliminate unnecessary surveys 
and improve the quality of those that are made. The family was 
notified of its selection in the sample, was asked for cooperation, 
and told that a staff member of the Bureau of Economic and 
Business Research of the University would call for an appoint- 
ment. 

. A visit by a staff member to the household and further explanation 
of the survey. If cooperation was secured, an appointment was 
made for a time when all the adult members would be at home. 

. The interview itself. Each adult family member was given a 
pencil and a questionnaire form and asked to write in his or her 
answers to the various questions as the interviewer read them. 
The interviewer also made sure that there was no “cheating.” If 
through one mischance or another, family members did see the 
others’ answers before writing in their own, the interview was dis- 
carded. 

The questions asked related to the following subjects: 

Nature and characteristics of durable goods purchased during the 
last six months—who made the purchase, who exerted the most 

influence in the purchase decision, reason for purchase, and how 
it was financed. 

Durable goods purchase plans for the following six months—who 
was largely responsible for each plan, when and where the purchase 
might be made, and contemplated method of financing. 

Opinion regarding the relative influence of each family member on 
the purchase of specific durable goods. 

Present economic status. 

Economic and political expectations. 

Family characteristics—size, ages, occupations, income, presence 
or absence of family budget. 


In this article, we shall examine the consistency of the responses on 
all of the questions except those relating to opinions of the purchasing 
influence of the various family members with regard to each other. 
The latter is a complicated matter in itself and has therefore been made 
the subject of a different study.* Because of this low rate of response 





2 See Ferber, Robert, “On the reliability of purchase influence studies,” Journal of Marketing 
January 1955, 225-33. 





790 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


(and especially because of a 38 per cent refusal rate), the composition 
of the sample differs from that of the population in some respects. To 
judge by 1950 Census statistics, it contains disproportionately high 
proportions of two-member families and of families whose head is en- 
gaged in professional work. In addition, the population itself, Cham- 
paign-Urbana, is dominated to some extent by the presence of the 
University of Illinois, and therefore may not typify in many ways 
other American communities. Hence, generalizations drawn from this 
study have to be made with great care. It is perhaps best to consider 
this a case study on this subject although, as will be noted later, many 
of the results would seem to possess a fair degree of generality. 


THE FINDINGS 


Inasmuch as family characteristics tend to influence the consistency 
of replies to the other questions asked, we begin with an examination of 
the consistency of the replies on the few characteristics of the family 
where such a check was possible. We shall then consider the consistency 
of the replies with regard to purchases, purchase plans, present eco- 
nomic status, and economic and political expectations. 

Family Characteristics. There were three questions on each question- 
naire relating to family characteristics to which each family member 
would be expected to give the same answer, namely, family size, family 
income, and presence or absence of a family budget. 

Before considering the actual findings of this study, a few comments 
on the nature of the sample would seem pertinent. 

The sample members were selected at random from a listing of dwell- 
ing units in the Champaign-Urbana area. Many of the original sample 
families, however, either refused to cooperate or did not cooperate 
sufficiently to provide completed questionnaires. All in all, a total 
of about 556 families with two or more adults were approached and 237 
completed sets of interviews were obtained, a response rate of 43 per 
cent. 

In the case of family size, there was little difference in the replies 
received from the same family. Only five differences appeared in the 
237 sets of replies, which is probably about as good as one might hope 
for. 

With respect to income, differences were much more frequent, al- 
though the magnitude of these differences is difficult to assess because 
respondents were requested to give the interval in which the family’s 





RESPONSES SECURED IN SAMPLE SURVEYS 791 


income fell rather than the actual figure. The extent of disagreement 
over the interval in which the family income lies is presented in Table 
1 segregated by the number of answers obtained, or questionnaires 
completed, from each family. Thus, two-answer families are those in 


TABLE 1 


EXTENT OF AGREEMENT ON FAMILY INCOME, 
BY ANSWERS PER FAMILY 








Tw Three- 
_ Or-More- All 


Statistic Answer oe 
a Answer’ Families 
Families .,, _.,. 
Families 





Proportion of families whose income is given 

in same interval 71% 44% 67% 
Proportion of families whose members dis- 

agree by one income interval or more 29% 56% 32% 
Proportion of families whose members dis- 

agree by more than one interval 3% 15% 5% 





Sample size* 182 27 209 





* Excluding families in which all but one family member answered “don’t know” on the income 
question. 


which two adults (18 years of age or more) were interviewed in this 
study; they are not necessarily two-member families, or even two- 
adult families, for family members under 18 years of age were not inter- 
viewed and, in addition, not all of the adult members of every family 
were always present at these interviews.* Three-or-more-answer families 
are, similarly, those from which at least three interviews were procured. 
Clearly, the consistency of the replies depends on the number of an- 





4 The intervals used were: under $1,500; $1,500-2,599; $2,600-4,199; $4,200-6,599; $6,600-10,399° 
and $10,400 and over. The limits of these intervals come out fairly neatly in terms of both weekly and 
monthly income equivalents. 

* The actual distribution of sample families by size and by number of answers is as follows: 








Number of answers Family size 
per family 





5 or more 





2 28 
3 or more 12 

















Total 40 











792 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


swers received per family, as well as on family size itself, which is 
why both means of classification are employed in this study. 

As is evident from Table 1, differences regarding the interval in 
which the family income falls were present in more than a quarter of 
the two-answer families and more than half of the three-or-more-answer 
families. Some of these differences may well have been fairly small, 
particularly where the family income might be near the limit of an 
interval. On the other hand, fairly substantial differences might have 
existed which were covered up by the use of income intervals, so that 
on balance the percentages given in Table 1 probably do not overstate 
the degree of disagreement existing within the sample families regard- 
ing their income. 

That the extent of disagreement should increase with the number of 
answers secured per family is only to be expected in view of the statistic 
used—a, “disagreeing” family being one where at least one member's 
answer departed from that of the others—and in view of the greater 
likelihood of disagreement as the size of the family, particularly the 
number of potential wage earners, increases. It might be noted asa 
matter of interest that in six of the 27 families for which three or more 
members reported an income figure, each member mentioned a different 
interval.® 

Whether the number of potential income earners influences the con- 
sistency of replies to the income question was tested by subdividing the 
two-answer families by number of adults 14 years of age or over and 
recomputing the percentages shown in Table 1 for each group. Doing 
this reveals that the proportions of two-answer families whose members 
disagree by one income interval or more is larger for the families with 
more adults, rising up to 40 per cent for families having more than 
three members 14 years of age or more, but this difference is not signif- 
icant at the .05 probability level. 

In other respects, there was little a priori to distinguish families 
disagreeing on income from other families. The disagreeing families 
were not concentrated in any particular occupation (of head of family) 
or income group. 

As with income, many families could not seem to agree as to whether 
or not they kept budgets. This question was asked in two forms, 
namely: 

a) Do you have a family budget in the sense of keeping breakdowns 

of all expenditures by the week or month? 





8 To avoid any possible confusion as to the entity for which an income figure was desired, the re- 
spondent was asked to state his own income first and then the family income; the two items were speci 
fied on the questionnaire; and the interviewers generally reviewed this question as the answer was being 
inserted. 





R. 1955 


ich ig 


al in 
er of 
iswer 
mall, 
of an 
have 
that 
state 
pard- 


er of 
tistic 
ber’s 
eater 


and 
Joing 
rbers 
with 
than 
gnif- 
ailies 
nilies 
nily) 


ather 
rms, 


owns 


RESPONSES SECURED IN SAMPLE SURVEYS 793 


b) Do you have a family budget in the sense of planning future 

durable goods expenditures on the basis of some schedule? 

Table 2 presents the extent of agreement on these two questions by 
number of answers per family. In about a third of the families, the 
answer to this question would have differed according to which family 
member was interviewed. The answer would have been, in fact, exactly 
the opposite in most of these cases—more than three-fourths of the 
disagreements involved situations where one family member answered 
“yes” and another “no.” 


TABLE 2 


EXTENT OF AGREEMENT ON PRESENCE OF FAMILY BUDGET 
BY ANSWERS PER FAMILY* 








Three- 
Or-More- 
Answer 
Families 


Two- 
Answer 
Families 





Proportion of families agreeing 
(a) Breakdowns of expenditures 71% 41% 
(b) Planning durable-goods purchases 62% 39% 
Proportion disagreeing 
(a) Breakdowns of expenditures 29% 59% 
(b) Planning durable-goods purchases 38% 61% 





Sample size 
(a) Breakdowns of expenditures 195 32 
(b) Planning durable-goods purchases 165 31 





* Excludes families with one or more members answering “don’t know.” 


The extent of agreement is lower for larger “answer-families,” in- 
dicating that unanimity of opinion within a given family regarding 
this question became progressively less frequent as the number of 
family members answering the question increased. It does not indicate 
that consistency declined as family size rose, for a tabulation by this 
breakdown revealed, if anything, the very opposite (though the trend 
was not statistically significant). This may well be due, however, to 
the fact that the answer to the budget questions was much more fre- 
quently “no” for the larger families. 

None of the socio-economic characteristics on which information 
was secured appeared to be related to the budget questions. No pro- 
nounced differences in consistency were evident by income, by occupa- 





794 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


tion of head, by age of head of the family, or by number of persons in 
the family 14 years of age or over. 

Furchases Reported. Comparison of the correspondence between 
durable goods purchases reported by the various family members re- 
vealed a degree of consistency that will appear shockingly low to many 
people but probably not too surprising to those with extensive experi- 
ence in conducting surveys. For only 16 (8 per cent) of the two-answer 
families, and for none of the three-or-more-answer families were the 
same purchases reported by all the respondents.’ In other words, in only 


TABLE 3 


EXTENT OF AGREEMENT ON DURABLE GOODS PURCHASES 
REPORTED BY TWO-ANSWER FAMILIES, BY FAMILY SIZE 


(Per Cent of Total Families in Category) 








Four-Or- 
More- 
Member 
Families 


Two- Three- 
Extent of Agreement Member | Member 
Families | Families 





Agree on all items: 
Some purchases 9% 5% 0% 5% 
7 0 0 


No purchases 3 
Differ by one item 14 15 10 13 
Differ by two items 18 15 15 16 
Differ by three items 18 15 33 23 
Differ by four or more items 34 50 42 40 





Total 100% 100% 100% 100% 





Sample size 90 40 73 203 

















16 cases did the two respondents from the same family list the identical 
purchases or both state that no such purchases had been made during 
the period in question; the latter accounted for six of the 16 instances. 

The extent of disagreement on actual purchases within the sample 
families is presented in Table 3 for the two-answer families segregated 
by family size. (For the three-or-more-answer families, there was at 
least one respondent in every case but one who differed from the other 
family members by at least four items. In 6 of the 34 cases, no two of 





7 The actual question was, “Has anyone in your household bought any durable goods since last 
———————— (name of month approximately six months earlier), that is, items similar to those listed 
on card A?” Card A listed about 50 different durable goods. 





RESPONSES SECURED IN SAMPLE SURVEYS 795 


the three or more family members were in agreement on all purchases 
reported.) This table brings out two things. One is that contrary to 
what one might expect, the disagreements that did occur tended to be 
more frequently over three, four or more items than over only one or 
two items. For all the two-answer families combined, in nearly two- 
thirds of the families characterized by disagreements, the answers 
differed by three items or more; less than one disagreement out of every 
seven involved only one item. 

The second point is that the extent of disagreement increased rapidly 
with family size. About 16 per cent of respondents in two-member 
families listed identical purchases, whereas this was true in no case in- 
volving a family with four members or more. The proportion of families 
characterized by disagreements over large numbers of items was also 
higher among families with more members. 

Family size was not the only characteristic found to be related to 
consistency of replies on durable goods purchases. Families with heads 
50 years of age and over were significantly more consistent in their 
replies than those with heads under 35 years of age. Low-income 
families were significantly more consistent that those at other income 
levels. Nearly 47 per cent of families earning less than $2,600 per year 
agreed on what they purchased during the previous six months as com- 
pared with only 6 per cent of families at higher income levels. Of course, 
the former did also report fewer purchases. This also may be the main 
reason why the only occupational difference of any significance was the 
much higher consistency of replies of families whose head had retired. 

It is interesting to note that families whose members disagreed on 
income or on budget records tended to be less consistent. Of the families 
whose members gave consistent answers on the income question, 
slightly over 10 per cent agreed on all durable goods purchases as 
compared to 2.1 per cent agreement for families whose members differed 
on the income figure—a highly statistically significant difference. The 
corresponding percentages for families agreeing or disagreeing on the 
budgeting of expenditures were 9.4 per cent and 4.7 per cent, respec- 
tively—statistically significant only at the .10 probability level. 

The material presented so far provides a general idea of the extent 
to which families are likely to disagree within themselves as to pur- 
chases reported. What is more important from the viewpoint of sample 
reliability, however, is the degree of understaternent of specific types 
of purchases likely to result from these disagreements. One indication 
of the extent of this understatement is provided by Table 4. For each 
of the purchase categories designated, this table indicates what pro- 





796 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


portion of the different total purchases reported by a respondent were 
omitted by any member of the same family. Thus, if two members of 
a family reported a particular purchase and one did not, this was 
counted as an omission.* However, the same omission by two or more 
family members was counted as only one omission. Hence, the 35 per 
cent “Total” figure in the first row of the table means that one or more 
family members neglected to mention a car purchase in 35 per cent 


TABLE 4 


PROPORTION OF TOTAL REPORTED PURCHASES IN PARTICULAR 
CATEGORIES NOT REPORTED BY ONE OR MORE FAMILY 
MEMBERS, BY FAMILY SIZE 








Family Size 





Category Four or 


Three mere 





Cars and motorcycles 43% 
Shoes 67 
Clothing 71 
Tires and tubes 71 
Building materials 74 
Furniture and rugs 78 
Household furnishings (incl. kitchen 
utensils) 83 
Personal items (mainly jewelry) 76 
Major household appliances, incl. radio 63 
Minor household appliances 47 

















Total* 61% 69% 





* Includes purchases not classified above. 


of the instances in which other family members reported such a pur- 
chase—a startling figure indeed. 

The other figures in this table are even more startling. More than 
two-thirds of the purchases made by these families were not mentioned 
by one or more family members. For several categories 75 per cent or 
more of the purchases were omitted, the omission percentage generally 
increasing with family size. In other words, a survey conducted among 
these families seeking to ascertain major household appliance purchases 





® Although there is always the possibility that the one(s) omitting the purchase may have been cor- 
rect, e.g., if the purchase were not actually made during the period mentioned in the question. This 
possibility is taken up in a later section and, to anticipate the results presented therein, it does not ap- 
pear to have been of much importance in this study. 





RESPONSES SECURED IN SAMPLE SURVEYS 797 


over the previous six months would, with sufficient misfortune in the 
selection of family members for interviewing, have turned up less than 
half the purchases that were actually made. This, of course, is an 
extreme case, as the likelihood of interviewing all the “wrong” family 
members in practice is very small. The example points up the fact that 
the percentages cited in Table 4 represent, in effect, upper limits to the 
degree of omission that might have occurred on purchases reported 
rather than expected values.® The latter are readily obtained, however, 
if we are willing to make assumptions regarding which member of a 
family is interviewed in practice. Of the numerous assumptions that are 
possible, the following five alternatives would seem to include the great 
majority of consumer purchase surveys: 


1. Any male adult member of the family. 

. Any female adult member of the family. 

. Any adult member of the family. 

. The head of the family. 

. The wife of the head of the family or the principal female in the 
family, i.e., the one who does most of the everyday buying and 
planning. 

The proportion of purchases in each category that would not have 
been picked up by a survey of the sample under study following, in — 
turn, each of the sample-member selection procedures outlined above 
is given in Table 5. The percentages under selection schemes 4 and 5 
are simply the number of items omitted in that category by the partic- 
ular individuals divided by the total purchases reported in that 
category. There is some duplication between these two selection proce- 
dures inasmuch as female heads of families would be interviewed under 
both schemes. 

In deriving omission percentages for the first two selection proce- 
dures, allowance has to be made not only for the omission of purchases 
but also for the probability of different male or female members being 
interviewed in families possessing more than one adult member of a 
given sex. This was done by assigning equal probabilities to the selec- 
tion of family members in such cases and adjusting the number of 
purchases omitted accordingly.!° A similar procedure was followed for 





* In this respect, however, the percentages are likely to be underestimates because no allowance is 
made for purchases made by a family during this time but not reported by any family member at all. 

1© Thus, if there were 25 families having two male adults, and no purchases of a particular type 
were omitted in 10 cases, one purchase was omitted by one male in 10 other cases, and one purchase was 
omitted by both males in five cases, the total number of omissions from the viewpoint of the first seleo- 
tion procedure would be 10. In the first ten cases there would obviously be no omissions. However, in 
half of the second ten cases, and of course in all of the last five cases, one would expect to encounter 
omissions. 





798 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19%; 


TABLE 5 


PROPORTION OF TOTAL REPORTED PURCHASES IN PARTICULAR 
CATEGORIES OMITTED UNDER FIVE ALTERNATIVE 
FAMILY-MEMBER SELECTION PROCEDURES 








Wife of 
Any Any Any Head | Head, or 
Category Adult | Adult | Adult of Prin- 
Female | Member/| Family | cipal 
Female 





Cars 28% | 19% 
Shoes 24 33 
Clothing 30 36 
Tires and tubes 54 38 
Building materials 41 36 
Furniture and rugs 34 39 
Household furnishings 37 42 
Personal items (mainly jew- 
elry) 39 46 
Major household appliances, 
incl. radios 31 21 26 
Minor household appliances 40 31 35 




















Total* 38% 35% | 





* Includes purchases not classified above. 


the third family-member selection procedure. The percentages in 
Table 5 for the first three selection procedures incorporate this adjust- 
ment. It should be noted, however, that no explicit adjustments could 
be made for possible omissions on the part of adult family members 
not interviewed, which in effect imputes to them the same pattern of 
omissions as was found for the respondents. 

The omission percentages in Table 5 are clearly well below those in 
Table 4, as would be expected. It is also clear that the extent of omis- 
sion varies considerably by the type of purchase (undoubtedly reflecting 
in part or in whole the effect of size of purchase as well as type), and 
that for most types of purchases the extent of omission depends to a 
large extent on which family member happens to be interviewed. In 
the case of automobiles, tires and tubes, and building materials, in- 
terviews with a male member of the family would have produced sub- 
stantially more purchases reported—oddly enough, in all of these 
cases more complete purchase information would have been obtained 
from interviewing any adult male in the family than from speaking 





RESPONSES SECURED IN SAMPLE SURVEYS 799 


with the head of the household. For nearly all other durable goods 
purchase categories, however, more complete purchase data would 
have been obtained from an adult female member of the household, 
with little difference between the completeness of replies of just any 
adult female member or the principal female member of the household. 

Although this test indicates that accuracy of response on consumer 
purchases depends to a large extent on which family members are inter- 
viewed in a particular study, the extent of omissions in any case would 
seem to many to be shockingly high. Thus, no matter which of the five 
family-member selection procedures were employed, over 30 per cent 
of the durable goods purchases of this sample would not have been as- 
certained, and under some of the selection schemes the number of 
purchases made and not reported in several categories would {have 
actually exceeded the number reported. 

There are, however, at least four extenuating factors. One is the 
length of the period of reference. Had the purchase question related 
to, say, one month or two months rather than to six months, many 
fewer purchases undoubtedly would have been omitted. Second is the 
generality of the question itself. Despite the use of a checklist of du- 
rable goods, it is reasonable to assume that had the focus been on a 
specific commodity, or class of commodities, the rate of omission would 
have been lower. Third is the absence of any restrictions on the size of 
the durable goods purchases to be reported. As a general rule, one would 
suspect that the rate of omissions would be related inversely to the 
lower dollar limit of permissible purchases reported. Fourth is the 
possibility that some of the omissions were really correct and that the 
reported purchases in some cases were in error. This is a possibility 
which is discussed shortly. 

Characteristics of the purchase. Four items of information were solic- 
ited regarding each purchase reported. They were: 

a. The identity of the family member actually making the purchase, 

b. Which family member was most instrumental in getting the pur- 

chase to be made, 

c. How the purchase was financed, 

d. The reason for making the purchase at that particular time. 

As is evident from Table 6, the extent of agreement on these ques- 
tions among the different family members varies considerably from 
one question to another. It is highest on the method of financing the 
purchase, about which seven out of every eight families were in com- 
plete agreement, and lowest on the reason for the purchase and on 
which family member was most instrumental in effecting the purchase, 





800 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


about which slightly more than half of the families were in complete 
agreement.” The latter are, of course, on a more tenuous basis than 
the others, which would account for the lower degree of consistency in 
their case."* It is nevertheless interesting to note that even in the case 
of a simple factual question as who made the purchase, different an- 
swers would have been secured nearly 40 per cent of the time depend- 
ing‘on which family member was interviewed. 


TABLE 6 


EXTENT OF AGREEMENT ON CHARACTERISTICS OF 
PURCHASES REPORTED* 


(Per Cent of Total Purchases to Which All Family Members 
Gave the Same Answer) 








Three-Or- 
More-Answer 
Families 


Total 
Sample 


Two-Answer 


Characteristic [ss 
Families 





Member making the purchase 63% 
Member most instrumental in purchase 57 
Method of financing purchase 88 
Reason for purchase 58 





* Excluding in each case families in which one or more members did not answer or were not sure of 
the answer. 


Validity of the omissions. The analysis thus far has been predicated 
on the assumption that purchase omissions were in fact errors, that all 
reports of purchases made during the given period were correct. How 
valid is such an assumption? Could not some of the purchase reports 
be erroneous in the sense that those purchases were not made during 
the period studied? In utilizing survey data +o estimate quantities of 
particular products purchased (usually staples), it is a common char- 
acteristic to obtain estimates far in excess of known production data 
adjusted for inventory changes. The reason is primarily a phenomenon 





11 If families in which one or more members answered “don’t know” were included and “don't 
knows” counted as regular answers, the above percentages would be reduced by three to five percentage 
points. 

12 The wording of these questions was: 

“Now, as we all know, the person buying a good is not always the same as the one who is most ‘in- 
terested in having the purchase made. In the case of each of the above items, which member or members 
of the family were most instrumental in getting the purchase to be made?” 

“Why were each of these purchases made at that particular time?” 

It should be noted that the reasons for the purchase were classified into broad categoriese—“needed 
it,” “special sale,” “just wanted it” or “felt like buying it.” Had a finer classification been employed, the 
extent of agreement on the replies would undoubtedly have been considerably lower. 


ee el i i a ee 





RESPONSES SECURED IN SAMPLE SURVEYS 801 


known as “telescoping,” meaning a tendency to encompass within the 
reporting period purchases made at other (invariably earlier) times. 
One may well inquire, therefore, about the extent to which this phenom- 
enon may account for the purchase discrepancies noted earlier. 

To answer this question a supplementary survey of 100 additional 
randomly-selected families was undertaken during January and Feb- 
ruary of 1955.1* Essentially the same approach was used as in the orig- 
inal survey except that only the questions relating to past purchases 
and to family characteristics were asked and the technique was intro- 
duced of letting the family members discuss their answers among them- 
selves after the interview had been completed. This enabled the in- 
terviewer to determine not only whether or not particular purchases 
had been reported correctly, but also the reasons for such omissions 
as were made. 

The supplementary survey yielded a much better response than the 
original survey, a factor that can probably be attributed to refinement 
of the initial contact. Of 90 eligible families (10 were one-adult families 
and hence not eligible), 16 could not be contacted (most had moved 
or were out of town), 15 refused, and 58—or 64 per cent of the eli- 
gibles—were interviewed, all but one being two-adult families. Members 
of these 58 families reported 277 durable goods purchases, and 164, 
or 59 per cent, were omitted by one or more members. This is a some- 
what lower percentage than that shown in Table 4, which may be due 
in part to the emphasis of this questionnaire on purchases. Discussion of 
the omissions after the interviews indicated the following reasons for 


them: 
Per Cent of Total 
Omissions 


Lack of knowledge 18% 
Forgetfulness 79 
Telescoping; purchase not actually made 2 
Other" 1 


100% 


Thus, for this sample in 98 per cent of the cases the purchase had 
been made and the omission was indeed an error. In view of the fact 
that there is no evidence to indicate that the replies of the sample in 





% The possibility that telescoping might exist was overlooked by the author and was only brought 
to his attention after the original survey had been completed. Since the original sample members had 
been told they would not be reinterviewed, a supplementary survey seemed the next best alternative. 

4 For example, in one case the purchase of a house was omitted although all arrangements had 
been completed and the deal closed. The wife knew about it but felt that it was not a purchase in her 
sense because they had not yet moved in. 





802 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


this respect would have differed from those of the original sample, 
it does appear that telescoping was not an important factor in this 
study. Telescoping may well be of greater significance in the case of 
items purchased frequently than for items purchased only occasion- 
ally, as is true of most durable goods. 

It is interesting to note from the preceding tabulation that in nearly 
a fifth of the omissions the reason was lack of knowledge rather than 
poor memory. The purchases omitted in these cases were mostly what 
one would suspect—tires and tubes, personal items, household furnish- 
ings. In four instances, however, the situation was encountered of a 
man and wife making most purchases independently of the other, and 
knowing of each other’s purchases only in the case of such “inescap- 
able” items as cars, houses and furniture. 

Purchase plans. Plans, of course, are more nebulous than actual 
purchases, and one would therefore expect more disagreement within a 
family with regard to purchase plans than with regard to purchases, 
This is especially so in the present case because no attempt was made to 
define a purchase plan. Rather the objective was to use the term in 
much the same sense, and with much the same lack of explicitness, as it 
has been employed in many buying-intention surveys in the past, and 
in this way to obtain some idea of the reliability of responses secured on 
such surveys. The opening question on the subject of purchase plans 
therefore was phrased as follows: “Are any of you planning to buy any 
durable goods between now and —-—-——— (a period approximately 
six months hence)?” Succeeding questions asked the respondent to list 
what he or she believed each member was planning to buy, when the 
purchase might be made and how it was to be financed. 

The extent of disagreement within two-answer families on purchase 
plans reported is indicated in Table 7. As with actual purchases, dis- 
agreements are clearly widespread, but contrary to the former situa- 
tion, the family members differed most frequently by only one or two 
items rather than by many items. It also appears that, contrary to 
what might have been expected, the extent of disagreement on plans 
is less than on purchases (compare Tables 3 and 7). This is also true 
for the three-or-more-answer families. 

Both of these phenomena may well be explained, however, by the 
many fewer plans than purchases that were reported. Thus, the number 
of plans reported by two-answer families averaged 2.5 as compared with 





16 The two samples are quite similar with regard to family characteristics. The greater emphasis 
placed by this questionnaire on past purchases might possibly have reduced the likelihood of telescoping 
somewhat (though a bias in the opposite direction would not be unlikely either), but the effect is hardly 
likely to be substantial. 





RESPONSES SECURED IN SAMPLE SURVEYS 803 


TABLE 7 


EXTENT OF AGREEMENT ON DURABLE GOODS PURCHASE 
PLANS REPORTED BY TWO-ANSWER FAMILIES, 
BY FAMILY SIZE 


(Per Cent of Total Families in Category) 








Four-Or- 


Extent of Agreement 


Two- 
Member 
Families 


Three- 
Member 
Families 


More- 
Member 


Families 





Agree on all items: 


Some plans 3% 8% 3% 
No plans 32 15 10 


Differ by one item 26 25 20 
Differ by two items 22 12 27 
Differ by three items 4 15 14 
Differ by four or more items 12 25 26 





Total 99%* 100% 100% 

















Sample size 90 40 73 





* Differences in totals from 100% due to rounding. 


an average of 5.2 purchases reported by the same families. In addition, 
a much larger proportion of families reported no plans at all than re- 
ported no purchases—20.7 per cent as against 2.9 per cent, respectively. 
For these reasons, there would obviously be much less room for dis- 
crepancies to arise in the case of plans than in the case of purchases. 

This is not to deny the fact that in a survey of this type the answers 
secured with respect to durable goods purchase plans would have 
possessed greater reliability than those secured on actual purchases. 
With respect to plans, the same answers would have been obtained 25 
per cent of the time irrespective of which family member was inter- 
viewed (taking the entire sample as a whole), and in 48 per cent of the 
time the answers would not have differed by more than one item. With 
respect to purchases, however, the same answers would have been 
obtained less than 8 per cent of the time from any family member, and 
the answers would have differed by more than one item for about four 
families out of five, depending on which family member was inter- 
viewed. 

Essentially the same characteristics that were found to influence the 
extent of agreement on purchases reported were also found to influence 





804 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


the consistency of replies on durable goods purchase plans. As is evident 
from Table 7, agreement among the different family members on pur- 
chase plans becomes less frequent as the family size rises. Families 
with heads 50 years of age or over were significantly more consistent 
in their replies than families with younger heads, as was also true for 
low-income families (those earning less than $2,600 per year) relative 
to higher-income families. In both of these cases, however, the families 
exhibiting the greater consistency—families with older heads and low- 


TABLE 8 


PROPORTION OF TOTAL REPORTED DURABLES PURCHASE PLANS 
IN PARTICULAR CATEGORIES NOT REPORTED BY ONE OR 
MORE FAMILY MEMBERS, BY FAMILY SIZE 








Family Size 





Three or 


Category Two mace 





Cars 62%* 100% 
Shoes 100 84 
Clothing 84 87 
Tires and tubes 100* 100* 
Building materials 100 97 
Furniture and rugs 95 94 
Household furnishings (inc. kitchen utensils) 100 91 
Persona] items (mainly jewelry) 91 95 
Major household appliances inc. radio 91 84 
Minor household appliances 100 95 








Totalt 90% 











* Less than ten plansin category. 
t Includes plans not classified above. 


income families—also reported fewer purchase plans. As _ before, 
families that disagreed on their income or on whether or not they main- 
tained a budget also tended to be less consistent than other families on 
their purchase plans. 

The extent to which families disagreed on purchase plans for specific 
types of goods is indicated in Table 8. This table corresponds to Table 4 
on actual purchases and shows for each category of goods what propor- 
tion of the different total purchase plans reported by any respondent 
were omitted by members of the same family. The picture that this 
table supplies regarding omission of plans by the family members is 





Se tt ORO ! 


—_— 


RESPONSES SECURED IN SAMPLE SURVEYS 805 


even more striking than that provided by Table 4 on omission of pur- 
chases. On the whole, nine plans out of every ten reported were omitted 
by one or more family members. With regard to purchase plans for 
such items as tires and tubes, building materials, or minor household 
appliances, interviewing the “wrong” adults would have turned up 
almost none of these plans. Even in the case of such a major item as an 
automobile, an “unlucky” survey of these families would have uncov- 


TABLE 9 


PROPORTION OF TOTAL REPORTED PLANS IN PARTICULAR 
CATEGORIES OMITTED UNDER FIVE ALTERNATIVE 
FAMILY-MEMBER SELECTION PROCEDURES 








Wi 
Any Any Any Head He "7 yl 


Category Adult | Adult | Adult of 


P *. . 
Male | Female | Member} Family rincipal 


Female 





Cars 27% 73% 48% 41% 73% 
Shoes 52 41 50 52 43 
Clothing 41 :| 49 47 48 53 
Tires and tubes 32 79 56 29 79 
Building materials 46 60 54 48 57 
Furniture and rugs 52 45 50 51 44 
Household furnishings (inc. 
kitchen utensils) 57 38 49 58 37 
Personal items (mainly jew- 
elry) 55 45 51 59 53 
Major household appliances, 
inc. radio 30 57 44 33 59 
Minor household appliances 52 48 52 59 48 

















Total* 








* Includes plans not classified above. 


ered only 14 per cent of the purchase plans for this commodity that did 
exist (and this may well be an overestimate because of plans in existence 
that may not have been reported by any family member). 

As was noted for the corresponding tabulation for actual purchases, 
the figures in Table 7 represent upper limits of the extent of omission 
that might have occurred in a one-interview-per-family survey of these 
families, and not the rate of omission that might have been expected 
to occur in practice in such a survey. Estimates of the expected omis- 
sions in actual practice are presented in Table 9 under each of the five 





806 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19%; 


alternative family-member selection procedures outlined in connec. 
tion with the analysis of the consistency of replies on actual purchases 
(p.797). As was the case with purchases reported, the percentages shown 
in Table 9 are considerably below those in Table 8. They are neverthe- 
less substantial and are generally much higher than the corresponding 
figures for actual purchases (Table 5). In other words, under any of the 
five family-member selection schemes tested, a much larger proportion 
of plans would have been omitted than purchases, both relating to 
periods approximately six months in length. In general, about half of 
the durable goods purchase plans would not have been reported irre- 
spective of which family member was interviewed (though, of course, 
not the same half in each case). 

Substantial differences are evident in the omission rates for different 
categories by sex of interviewee. In the case of auto purchase plans, 
interviews with female members of the household would have failed 
to uncover nearly three-fourths of the plans. Larger omission rates also } 
characterized the female members’ replies with respect to tires and 
tubes, building materials, and, oddly enough, major household appli- 
ances. On the other hand, more purchase plans for shoes, furniture and 
rugs, household furnishings, personal items and minor household ap- 
pliances were not mentioned by male family members. As for actual 
purchases, omission rates appeared to vary, with a few exceptions, more 
by sex than by the member’s status in the family. 

In examining Table 9, it should not be forgotten that the same 
qualifications attached to the estimates of the proportions of actual 
purchases omitted also apply to purchase plans. Had the period of 
reference been much less than six months, or had the question been 
focussed more directly on a specific category of goods, the omission 
percentages would undoubtedly have been considerably lower. 

Present economic status. Only one question, other than the one on 
income, was asked on the subject of present economic status. This ques- 
tion was whether the respondent thought that the family was finan- 
cially better or worse able to buy durable goods at the current time than 
a year ago. As is evident from Table 10, in nearly half of the families 
the answer varied depending on which person was interviewed. Even 
when those families are excluded in which the cause of the discrepancy 
was one or more members answering “not sure,” disagreements on the 
answer to this question remained in one-third of the families. To what 
extent these differences may be due to the inherent ambiguity of any 
general question on “ability to buy” rather than to difference of opinion 
is as yet an unanswered question. 





OR 195; 


nnec- 
shases 
hown 
rthe- 
nding 
of the 
tion 
ng to 
alf of 

irre- 
urse, 


-rent 
lans, 
sailed 


also § 


and 
ppli- 
and 
ap 
tual 
nore 


ame 
tual 
1 of 
een 
sion 


RESPONSES SECURED IN SAMPLE SURVEYS 


TABLE 10 


EXTENT OF AGREEMENT ON FAMILY ABILITY TO BUY 
DURABLE GOODS, BY ANSWERS PER FAMILY* 


(Proportion of Families in Which All Members Interviewed 
Gave Same Answer) 








Three- 


Two- 
Member 
Families 


Or-More- 
Answer 
Families 





Proportion of families in agreement 


56% 


44% 


Proportion in agreement, excluding fam- 
ilies where one or more members an- 
swered “not sure”’ 68 50 

Sample size* 197 34 














* Excluding families in which one or more members did not answer the question. 


Contrary to the findings with regard to the previous subjects, families 
whose members disagreed on income or on maintenance of a budget 
were not mor” likely to disagree on this matter as well. The same was 
true for income and family size, the only socio-economic characteristics 
tested in this regard. 

Economic and political expectations. Three questions on expectations 
were included in the questionnaire. They were: 

“In the coming six months, do you expect your family income to be 

higher, lower, or about the same as in the last six months?” 

“Is there anything that you believe will be hard to get during the 

next six months because of our mobilization program?” 

“Do you expect our relations with the Communist nations to get 

better or worse or not change during the next six months?” 

As is evident from Table 11, the extent of family agreement on these 
questions is no better than on ability to buy durable goods, and in the 
case of one question—future relations with the Communist nations— 
appears to be considerably poorer. This is a question concerning which 
much uncertainty existed at the time the interviews were conducted, 
in 1951-52, and it is possible that the same question asked in late 1953 
or early 1954 would have evoked more agreement within the sample 
families—if for no other reason than that greater unanimity of opinion 
appeared to exist at the time regarding the international situation. The 
reason why lesser disagreement was noted on such a related question 
as the existence of possible shortages because of the mobilization pro- 





808 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19; 


TABLE 11 


EXTENT OF AGREEMENT ON ECONOMIC AND POLITICAL 
EXPECTATIONS, BY ANSWERS PER FAMILY* 


(Proportion of Families in Which All Members Interviewed 
Gave Same Answer) 








Three- 


Two- 
Or-More- 
Type of Question Answer > Seon 


Families Sensiiies 





Family income expectations: up, down, or 
about the same 65% 56% 
Possible shortages due to mobilization 
program; yes or not 67 37 64 
Trend of our'relations with Communist na- 
tions: better, worse, or no change 50 24 47 














* Excluding families in which one or more members answered “not sure” or “don't know.” Sample 
sizes are approximately the same as those shown in Table 10, except where indicated to the contrary. 
t Based on 85 two-answer families and 8 three-answer families. 


gram may well be due to the fact that only two definitive answers to 
this question were possible as against three for the other.'* 

From the point of view of design of public opinion surveys, these 
findings have interesting implications. To judge by the data in Table 11, 
the intra-family correlations on many opinion questions may well be 
zero, if not negative, so that in such situations greater efficiency will 
be obtained by interviewing more than one family member rather than 
by scattering of interviews. 


IMPLICATIONS OF THE RESULTS 


In essence, the foregoing findings may be synthesized into three 

general propositions. These are: 

1. The degree to which the attitudes and expectations of one member 
represents those of other family members is not high. 

2. A particular family member generally does not have complete 
knowledge concerning the durable goods purchases and purchase 
plans of other family members over a period of approximately 
six months. The extent of this phenomenon appears to increase 
for larger-size families. 





% In accordance with the findings regarding present economic status, no statistically significant 
relationship was found to exist between extent of agreement on any of these questions and family size, 
income, and extent of agreement on income or on maintenance of a budget. 





RESPONSES SECURED IN SAMPLE SURVEYS 809 


3. In quite a few families, information obtained regarding family 
status and characteristics will differ according to which family 
member is interviewed. 

These propositions can not be considered to be fully established on 
the basis of this one study, but they represent, at the least, hypotheses 
worthy of further investigation. The first proposition in particular is 
on tenuous ground not only because of its being based on the results of 
one sample but also because of the limited extent to which the questions 
asked on this subject in the survey are representative of the multitude 
of questions falling under that heading. 

The findings of this study are nevertheless so conclusive in them- 
selves as to cast serious doubts on the reliability of information ob- 
tained on many consumer surveys by the conventional method of inter- 
viewing one family member or any family member that answers the 
door. More important, however, is the light provided by this study on 
how the reliability of information obtained on such surveys might be 
improved, namely, in the following ways: 

1. Do not treat economic or political attitudes and expectations of 
one family member as representative of those of the family as a whole 
except if each (adult) member’s reply is obtained and all answers 
coincide. Unless all family members’ replies are obtained, each reply 
is best treated as an individual opinion rather than a collective one. 

2. The same is true for family characteristics and status. Such in- 
formation is best, obtained from at least two family members, and pref- 
erably all the adult family members. 

3. Six months would seem to be toc long a period for which to solicit 
* information regarding durable goods purchases or purchase plans. This 
may well be true even if all the family members are interviewed, al- 
though no information bearing directly on this point is obtainable from 
this study. 

4. If, nevertheless, such information for a period covering six months 
or longer is required, the interview is best conducted with all adult 
family members present and with the questions focussing on particular 
goods. In fact, one could well assert that, at a given cost, expenditure 
of resources for more complete interviews with each family would lend 
greater over-all validity to results of such surveys than use of the same 
resources to increase sample size (in terms of number of families or 
household units). 

5. Completeness of response on purchases and purchase plans varies 
by type of good and depends more on the sex of the (adult) interviewee 
than on family status. In other words, in a given study where only one 





810 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1% 


family member is to be interviewed, the primary determinant of whic) 
member should be chosen is which sex is likely to be more concernej 
with the purchase of that particular good or category of goods. 1) 
judge from this study, the choice in the case of the goods studied might 
be as follows: 


Male Female 
Cars Shoes 
Tires and tubes Furniture and rugs 
Building materials Household furnishings 
Major household appliances Major household appliances 
(for plans)!” (for purchases)!” 
Minor household appliances 


For personal items and clothing, interviews may well have to be in- 
terpreted on an individual basis, i.e., not related to the family as, 
whole. It should also be noted that the above categories are broad 
ones, and that the choice for a particular item in a category might not 
necessarily be the same as for the entire category. 

To some extent the margin of preference in a particular case will be 
decreased as the period of reference is reduced. If information regarding 
purchases or plans is desired for as brief a span as one month, the above 
classification may not possess too much relevance. 

Above all, this study points up the possibility that the securing o 
reliable data at least in consumer purchase studies is much mor 
difficult than has heretofore been supposed. The use of an individual as 
a spokesman for the family or household unit combined with the often 
implicit acceptance of the reliability of the replies pertaining even to 
the individual are major sources of sample bias. 





17 This may not be as paradoxical as it may seem at first sight. Evidently, the males may have made 
the plans, but the females make the most use of these appliances, thereby leaving such purchases fresb- 
er in their minds. 





THE COCHRAN-MOSTELLER-TUKEY REPORT ON THE 
KINSEY STUDY: A SYMPOSIUM* 


Aurrep C. KINSEY AND AssocIATES; HerBERT Hyman AND Paut B. SHEATSLEY; 
A. H. Hosss anv R. D. Lampert; NicHouas Pastore AND 
Jacos Goupste1n; Lewis M. Terman; Paut WALLIN; 
W. ALLEN WALLIS; AND Wi.Lu1am G. CocHRAN, 
FREDERICK MOSTELLER AND JOHN W. TUKEY 


Eprror’s Nore: Because of the unusual character of the CMT report, an 
appraisal officially sponsored by organized scientific bodies (the American 
Statistical Association and the National Research Council), we are adopting 
an unusual form of review. Kinsey and his associates have been invited to pre- 
pare the main review, each of the six critics of the V insey report whose criticisms 
are considered by CMT has been invited to contribute not more than two pages 
devoted primarily (but not necessarily exclusively) to indicating whether he 
accepts the CMT' position on his criticisms, and CMT have been invited to sub- 
mit a final word on all these statements. All the statements were circulated among 
all the participants before publication. W.A.W. 


* * * 


ALFRED C. KINSEY AND associaTEs, University of Indiana 


T CMT report on the statistical handling of the data in our volume 


on Sexual Behavior in the Human Male now appears some eight 
years after the manuscript for that volume was completed, and some 
fifteen months after our second volume, Sexual Behavior in the Human 
Female, has come into print. We are indebted to Cochran, Mosteller, 
and Tukey for suggesting some of the modifications which were in- 
troduced in handling the data in our second volume. We are especially 
indebted to Harold Dorn and Jerome Cornfield who, as statistical con- 
sultants in the preparation of that second volume, helped us develop 
the forms in which we presented our data, and guided us in making 
more guarded interpretations of our material. 

Most of the suggestions made in the CMT report were distinctly use- 
ful, and we utilized a large number of them—specifically something 
more than a hundred of them—in preparing our second volume. We 
disagree with the CMT report on only one major issue, namely the 
practicality of obtaining a probability sample in our area of research. 
Because of the sensitive nature of the subject involved, we do not 
believe that probability sampling is practical in any extensive study of 





* A review article on Statistical Problems of the Kinsey Report, by William G. Cochran, Frederick 
Mosteller, and John W. Tukey, with the assistance of W. O. Jenkins. Washington, D. C.: The American 
Statistical Association, 1954. Pp. x, 338. $5.00. 


811 











812 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 





human sexual behavior which attempts to survey the whole populatiq 









of a large city, a state, or the United States. tee 

To quote and paraphrase from the statement which we have already feld 
published (pp. 25 to 31 in our volume on the female) we may note again pict 
that our difficulty stems primarily from the fact that most person hold 
have engaged in sexual activities which are socially taboo and oft. gar 
times legally punishable under the existent laws. Even premarital and mt 


extra-marital intercourse are criminal acts in many states, and mouth. 
genital contacts, homosexual activities, animal contacts, induced 
abortions, all contacts with persons below the age of 18 or even 21, and 
still other types of sexual activity are punishable in all or nearly all of 
the states of the Union. Any public disclosure of such behavior may 
lead to a considerable loss of social prestige, and (in this day of goven- 
mental concern over the private lives of its millions of employees) tos 
loss of employment, even if it does not lead to criminal action. Con- 
sequently few persons discuss such matters with anyone, even including 
their spouses, their clinical counselors, their religious advisors, or their 
most intimate friends. Certainly they are not easily persuaded to dis- 
cuss such matters with an interviewer representing a scientific survey. 

Our primary problem, therefore, and the one which few of the re- 
viewers of our work have adequately recognized, has been to discover 
some process by which we could persuade our subjects to divulge their 




















m 
sexual activities with such fullness and completeness as their memories a 
would allow. To do this, we have had to give prime attention to estab- fg 
lishing rapport and developing effective interviewing techniques. Only s 





then has it begun to be possible to select the sample with which we 
wished to work. 

Sometimes it has seemed that the requirements of accurate sampling 
were antagonistic to the requirements for adequate reporting. But we 
have not wanted a representative sample of unreliable answers, and 
neither have we wanted reliable answers from respondents who repre- 
sented nobody but themselves. For this reason we have substituted, 
for the usual methods of probability sampling, a method of group 
sampling through which we have tried to secure representatives of a 
number of the components of the larger population in which we were 
interested. By making our approaches to individual subjects through 
the groups to which they belonged, and by working as long as necessary 
to establish our reputation in each group, we have been able to develop 
group acceptances of our project—after which it became possible to 
persuade the individual members, and in many instances a hundred 
per cent or nearly a hundred per cent of all of the members of each 
group, to cooperate. 





















{HE COCHRAN-MOSTELLER-TUKEY REPORT 813 


Obviously, these problems are quite different from those which are 
involved in a study of the incidence of an insect infestation in a corn- 
field, or the frequency with which a given population attends moving 
picture performances, or the number of tires which the average house- 
holder owns for his automobile. Even economic, social and political 
surveys rarely deal with data which are as sensitive as those with which 
we have had to deal. We feel certain that an approach to the lone in- 
dividuals who might have been the pre-selected respondents in a 
probability sample (in a study of any large population) would have led 
to such refusal rates that the sample would have been worthless. More- 
over, even those who would have agreed to talk to us would not have 
given such full and complete information as we have secured through 
our method of group sampling. This appears to be the explanation of 
the fact that some of the statistically best planned studies on even less 
sensitive aspects of human sexual behavior have arrived at generaliza- 
tions which are completely remote from the experience of clinicians, 
trained social workers, and all of the case-history studies. Their error 
obviously stems from the fact that they have given prime attention to 
their sampling patterns and inadequate attention to the problems of 
interviewing. 

To suggest, as some persons have, that our experience should now 
make it possible for us to secure a good probability sample, indicates 
some failure to comprehend the especial difficulties which we have 
faced and still face. Even in such closely knit communities as church 
memberships, college classes, city rooming houses, prison groups, and 
fraternal organizations, it has usually taken months or even years 
to establish the community interest which has finally made random 
sampling possible. For instance, we worked for four years in the Cali- 
fornia State penal institutions before we were able to lower the refusal 
rates on a pre-selected sample to the 5 or 6 per cent rates at which they 
have stabilized within the last two years. 

There is no major area of human physiology, psychology, or psy- 
chiatry in which our knowledge has been less adequate than in the 
area of sexual behavior. For this reason, we have never intended to 
confine our research to a study of the incidences or the frequencies of 
the various types of behavior in these United States. From the be- 
ginning we have attempted to secure data on the anatomy and physiol- 
ogy of sexual response (Chapters 14,15, and 17 in our volume on the 
female), on the relation of hormones to sexual response, on the factors 
which account for particular patterns of sexual behavior, on the sig- 
nificance of each type of behavior in the social organization, on the 
problem which society faces in attempting to control the social behavior 





814 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 


of its individual members (sex law and sex offenders), and on still other 
matters. Not more than 20 per cent of the information which we haye 
so far gathered has been brought together in the two volumes which are 
now published. Data which we already have will be the subject matte 
of later publications on sex law, on sex offenders, on juvenile sexual 
delinquency, on prostitution, on transvestism, on homosexuality, on 
the relation of drug addiction to sexual activities, and on still other 
matters. In all of these areas we should be able to add to our knowledge, 
but in every one of these areas other investigators will have to make the 
more extended studies which we, in one life time, shall not be able to 
make. We have chosen, in view of our present lack of knowledge on 
most of these matters, to make a general survey of the whole area of 
human sexual behavior. 

The present discussion of the statistical problems which we have 
faced may contribute materially to the effectiveness of our own further 
work, and to the planning and execution of the research which other 
groups may choose to undertake. But these discussions will have some- 
how failed if they do not point up the difficulties which are sometimes 
involved in applying statistical ideals to particular areas of research. 


* * *® 


HERBERT Hyman, Columbia University, and Pau. SHEATSLEY, 
National Opinion Research Center’ 


HE Editor has asked us to comment on the monograph by Cochran, 

Mosteller, and Tukey on Statistical Problems of the Kinsey Report; 
more particularly to express our reactions to their treatment of our 
earlier review of Kinsey. On this latter score, our comment will be 
brief. We find nothing to criticize in their remarks about our reviev. 
To be singled out for such detailed attention and occasional criticism 
is to feel honored and to profit. On the larger treatment of Kinsey by 
these writers, we have elsewhere? remarked in print on the soundness 
of their criticism and the balance and wisdom they exhibit. Having 
attempted such a review ourselves makes us sensitive to the great 
difficulties involved. Cochran, Mosteller, and Tukey have done a major 
work, far exceeding any man’s ordinary conception of criticism or re- 
view. We should like to address our remaining remarks to the larger 
values of their work. 





1 “The Kinsey Report and survey methodology,” International Journal of Opinion and Attitude 
Research, Vol. 2, No. 2 (1948), pp. 183-95. 

? Hyman, Herbert H., “Sexual behavior in the human female—a special review,” Psychological 
Bulletin, 51 (1954), p. 419. 





{uE COCHRAN-MOSTELLER-TUKEY REPORT 815 


In one section of the present monograph, six previous reviews are 
examined and the comments of the reviewers are grouped under head- 
ings corresponding to the major phases of the research process, ¢.g., 
sampling, interviewing, etc. This content analysis, in a sense, provides 
s rare, perhaps unique, opportunity to see what elements of sub- 
ectivity or idiosyncrasy characterize present-day technical standards 
in research and to what extent agreement exists on standards. If 
evaluation were capricious and if standards varied, it would provide 
implicit evidence that no investigator would know how to proceed. 
Principles for the conduct of inquiries in the behavioral sciences would 
be lacking. In our opinion, this content analysis of the six earlier re- 
views, plus the parallel treatment by a seventh, Cochran, e¢ al., estab- 
shes that there is a body of governing principles. Seven expert wit- 
nesses are in fundamental agreement. But it will also be clear to the 
reader that in some places our present principles fall short. The wit- 
nesses don’t always agree. The principles either provide no clear stand- 
ard as to correct practice or they point to a technical problem demand- 
ing solution, without being able to point the direction of that solution, 
given the exigencies of field study of a complex problem. To know the 
limits of current knowledge is of value in its own right. But here is 
where Cochran, Mosteller, and Tukey provide another valuable lesson. 
In their suggestions for solution to some of these problems they instruct 
the student in that blend of principles and modifications, in that 
simultaneous orientation to ideals and realities, in that tentativeness 
rather than dogmatism, which mark the best research practice. 

Apart from these larger considerations, there is much more to be 
noted in this monograph which is of general value. For example, Ap- 
pendix G on Principles of Sampling, published separately in this 
Journal (vol. 49, 1954, pp. 13-35), can be read apart from the study 
of Kinsey and constitutes a brief, clear and valuable general reference. 

The magnitude of this critical work is appropriate for the importance 
of Kinsey’s own work. But Kinsey apart, a documented exposition of 
principles of research in the context of the case study of any research 
inquiry provides an ideal vehicle for training. This is such an exposi- 
tion. 

For these and other reasons, Cochran, Mosteller, and Tukey are to 
be commended on the conduct of their committee inquiry and their 
} monographic report. 





816 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 14 


A. H. Hopss ann R. D. Lampert, University of Pennsylvania 


ITHIN the important but restricted territory encompassed by thej: 
\\ commission from the National Research Council to evaluate the 
statistical methodology of the first Kinsey Report, Cochran, Mosteller 
and Tukey have carefully surveyed an uneven terrain and posted ap. 
propriate warning signs around the principal pitfalls. Since the con. 
mittee which requested this evaluation also “... wished to make 
constructive advice available to Dr. Kinsey’s group,” CMT designed 
a blueprint which might enable KPM to stay out of some of thei 
methodological dead-end streets in future volumes. 

In an interesting commentary upon the scientific status of such in. 
vestigations of human behavior, CMT acknowledge that a judiciou 
selection of topics from the six critical reviews which constitute the 
groundwork for their analysis would have enabled them to write “*..., 
two factually correct reports, one of which would leave the impression 
with the reader that KPM’s work was of the highest quality, the other 
that the work was of poor quality and that the major issues wer 
evaded.” It is our impression that the authors were notably successful 
in their stated attempt to avoid both of these extremes. In an over-all 
view their evaluation is comprehensive in technical coverage, judicious 
in interpretation, and considerate of all concerned in its critical con- 
mentary. Regrettably, but understandably, the nature of their com- 
mission permits only brief and incidental mention of criticisms which 
others regard as trenchant and produces an analysis which, by weighing 
items separately and on specifically statistical scales, fails to measure 
the cumulative weight of the KPM conclusions. 

Though some aspects of the critical interpretations are questioned 
by CMT, they and the critics seem to agree in substance that a grave 
omission was involved in KPM’s failure to include basic data to permit 
readers to know the actual number of cases in each subgroup and in 
cross-groupings, and to enable other investigators to evaluate the con- 
clusions; that failure to specify the questions used in interviews was 4 
more understandable but still serious handicap in attempts to evaluate; 
that the use of volunteers and contact men almost certainly distorted 
the sample, as evidenced by large disproportions within the sample 
and between the sample and the unsampled white male population; 
that KPM, therefore, had no warrant to project their sample to the 
entire U. 8. white male population; that in view of the obviously non- 





1 “An evaluation of ‘Sexual Behavior in the Human Male’,” American Journal of Psychiatry, 104 
(1948), pp. 758-64. 


i, eee ie a ee ee 





THE COCHRAN-MOSTELLER-TUKEY REPORT 817 


representative sample, too many sweeping generalizations were based 
on small numbers of cases; that the checks on the sample and on the 
procedures were inadequate; that the meaning of many tables is ob- 
scure, With the number of actual cases in various subgroupings unde- 
fined and shifting. That textual statements sometimes do not conform 
with tabular data. That the accumulative incidence technique requires 
a degree of stability of behavior and representativeness in the sample 
which is not present; and that conclusions based on comparisons of 
successive “generations” are questionable. Thus, in substance, CMT 
agree with most of the crucial statistical limitations emphasized by the 
critics. 

Despite their objectivity and their eminent fairness to both KPM 
and the critics, the authors, restricted as they were, are compelled to 
scan a wide area through a narrow lens. Most of the critics, ourselves 
included, tried to view Kinsey’s first volume from a perspective of 
erroneous impressions which were likely to be conveyed to its wide and 
statistically naive audience, even though based upon technically 
acceptable procedures. Thus when our statements about the misleading 
nature of the accumulative incidence technique are questioned, and 
when CMT contend that our criticisms of the active incidence data 
lack relevance (p. 127) the disagreement, we believe, arises primarily 
from the differing perspectives through which these procedures and the 
conclusions based upon them are viewed. 

Our original statement relating to this issue reads: “To ascertain the 
number in any age group who are engaging in the specific activity, use 
of the accumulative incidence technique would presumably necessitate 
a high degree of representativeness for all significant factors within the 
given age group... .” CMT commented that “these criticisms might 
be relevant if KPM had used accumulative incidence as if it were 
equivalent to active incidence.” What we meant to refer to was not the 
accumulative incidence, but the distribution of a few cases over a 
number of age groups in both the accumulative and active incidence. 
KPM used recall data to place an individual in each age group younger 
than the one into which he falls at the time of interview. Where the 
upper age groups are not representative for important background 
characteristics correlated with the dependent variable, bias is in- 
troduced through each of the lower age groups. When we pointed out 
that the upper age groups were highly unrepresentative on marital 
status which is associated with variations in sexual practice, CMT 
answered that KPM were committed only to relatively equal groups 
which did not require representativeness within each category. This 





818 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


we would agree with if the cases from one age group were not used in 
preceding age groups, thus removing independence. 

To indicate what we have in mind, we will work through part of an 
active incidence table (p. 652) which gives the percentages of single 
males by age and educational group that are in each of the homosexual 
rating groups. On KPM’s scale 4+ includes those who are more homo- 
sexual than heterosexual in their contacts and/or psychic reactions, and 
we will confine our example to them. (It might be added that this table 
could hardly be more exasperating to anyone interested in checking 
the computations. The N’s are erratic, the totals do not total, the age 
groups differ from one educational level to another, the matching 9-12 
educational level for married males is omitted, and when percentages in 
each homosexuality rating are applied to the total number of cases 
whole numbers do not emerge.) 

Since KPM fail to give the number of cases in any age group at the 
time of interview, let us assume for the purposes of illustration that 
those who report themselves as 4+ homosexuals at the upper age group 
fell under the same rating in the earlier age groups. It would appear 
that this assumption is not too unreasonable, but even if it were un- 
likely, the possible bias explained below would only become more 
variable. Consider the following table computed from KPM’s table on 
page 652 for the single males of educational level 13+. 


ACTIVE INCIDENCE OF SINGLE 4+ HOMOSEXUALS BY AGE 
FOR EDUCATIONAL GROUP 13+ 








(1) (2) (3) (4) (5) (6) 
No. of Casesin No. of Homos. No. 4+ Homos. at age 35 
Age Cases t2—2... 4+ 2-2... 4 
Homos. 4+ Homos. at a given age 
15 2846 540 273 135 ‘ 8.1% 
20 2306 1619 138 56 15.9 
25 687 508 82 37 i 26.8 
30 179 108 45 23 " 48.9 
35 71 71 22 21 . 100.0 








x =number of cases at a given age. 


It will be apparent that those in age 15 who do not appear in age 20 
(column 2) are divided into two groups; some were interviewed between 
the ages 15 and 20 and some have passed into the married group and 
thus at age 20 will no longer be in the sample of single males. Since 
KPM give no breakdown of interviewees at age of interview, it is im- 
possible to estimate the per cent of the “lost” cases that are due to each 





—_— 


oan tai Gea of oti owe et OO, eee 


THE COCHRAN-MOSTELLER-TUKEY REPORT 819 


factor. Even if these data were given, a proper analysis would require 
that the ratios of 4+ homosexuality in each of the two categories of 
“Jost” cases be given. With these unknowns it is impossible to estimate 
the contribution of single males at older ages to the per cent of 4+ 
homosexuals at the earlier ages. 

We can, however, calculate the per cent of the total number of 4+ 

homosexuals at each age which are contributed by the oldest age group 
(column 6) under the assumption of stability of rating from the earliest 
to the oldest ages. Some measure of the significance of this contribution 
perhaps in terms of the mean square error of each age group would 
have to be devised, but it is greatest where the cases in a given educa- 
tional level at the time of interview are concentrated at the older ages 
and/or where the total number of cases is small. A similar calculation 
for the educational level 9-12 where the cases are fewer will indicate 
an even greater contribution from the older age groups. 


ACTIVE INCIDENCE OF SINGLE 4+ HOMOSEXUALS, 
EDUCATIONAL LEVEL 9-12 








Age 30, 4+ Homos. 





Age No. of Cases No. of Homos. 
4+ Homos. at given age 





15 629 109 22.9% 
20 350 70 35.7 
25 140 43 58.1 
30 67 25 100.0 
35 (not given) 





z=Cases at a given age. 


There is nothing illogical about the position that those predomi- 
nantly homosexual should stay single, or to put it in KPM’s terms that 
the group age 35 and still single at the time of interview would have 
been more likely to be predominantly homosexual at age 15, 20, 25, and 
30. The problem is that the older age groups (married and single) are a 
part of each of the younger age groups, and if they are disproportionate 
on an important variable such as marital status, then they should not 
be thrown into the lower age groups with an implicit weighting of one. 
If there is an undue proportion of single males at the 35 age level, then 
they should be weighted on some such pattern as the U. 8S. corrections 
for the relevant universe before they are thrown in with cases inter- 
viewed at younger ages. The relatively equal sub-samples are useful for 





820 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


representing frequencies or proportions in a given cell, and with ade. 


quate corrections as a contribution to a weighted mean. It seems to uy 


to introduce dangers of bias when a nonproportional sample from ap 
older age group is used to fill out younger age cells. To dramatize the 
problem, consider the hypothetical case where 95 per cent of those in. 
terviewed at age 35 were single. If enough cases were involved, pre. 
sumable correction factors could make this a reasonable estimate for 
homosexuality at age 35. But when these cases appear in the 20 year 
age group, they are corrected only in terms of the proportion which 
are single at twenty in the relevant universe. The unusual character of 
these cases is lost when they are thrown into the twenty year group. 
Thus we argue that a system of weighting must be developed which 
will take into account not only disproportions of background factor 
in the specific age group in question, but also where the individual fits 
at the time of interview if he is to be used in other cells for earlier ages, 
Even if the assumption of stability of rating is to be abandoned (and 
such data should already be available to KPM) weighting for back- 
ground factors at the time of interview must be used at earlier age 
groups. In the case used in the example, the bias appears to be upward, 
but in other sampling distributions it could underestimate the inci- 
dence. 

After agreeing that the KPM statements relating to homosexuality 
(p. 665) are carelessly worded, CMT object (p. 145) on similar grounds 
of careless wording to our statement that KPM’s references to high 
rates of “homosexuality” “... refer to an activity which may have 
occurred no more than once during a lifetime. . . .” Here, again, per- 
spective is an important factor in the interpretation. It was (and is) 
our impression that KPM rationalized to build up the rates and en- 
ployed the term “homosexuality” in a manner likely to give an er- 
roneously extreme impression of the magnitude of this form of behavior. 
On such bases, and being limited to the confines of an article for 4 
non-statistical journal, we felt justified in employing a common rhetor- 
cal device as an antidote to the extreme impression conveyed by KPM. 
Admittedly, our statement is strong, but it is not careless, and on the 
bases of the ambiguous definitions and of the absence of crucial data 
which are withheld on the topic we believe that it can be hypothetically 
rationalized in the same manner as KPM’s reiterated and emphasized 
equally strong statements of opposite tenor to which it was offered as 4 
counter-play. 

Persons are rated as 1’s on KPM’s scale of “homosexuality” 


. .. if they have only incidental homosexual contacts which have involved 
physical or psychic response, or incidental psychic responses without physical 





> 


a wm @©® @&® = es @ 


THE COCHRAN-MOSTELLER-TUKEY REPORT 821 


contact. The great preponderance of their socio-sexual experience and reac- 
tions is directed toward individuals of the opposite sex. Such homosexual 
experiences as these individuals have may occur only a single time or two... 
(pp. 639 and 641, emphasis added). 


Definitions of 2’s and 3’s also include persons who have had either 
physical or psychic responses; either actual experience, or “reactions.” 
Even the 4’s, designated as being “predominantly homosexual,” are 
defined as those who “... have more overt activity and/or psychic 
reactions in the homosexual. . . .” So long as the definitions include 
those with psychic reactions and those with only incidental experience, 
the failure of KPM to include crucial definitive data leaves open the 
possibility that actual homosexual behavior may have occurred no 
more than once even though the percentages are based on a three-year 
interval. In the case of some, indeed, it is quite possible that homo- 
sexual activities were never practiced. (See KPM, p. 623.) 

Further explanation of points on which CMT question our criticisms 
would merely reiterate the obvious fact that they viewed KPM from a 
different angle than did we. 

CMT’s recommendations for future volumes are, of course, excellent. 
We agree that a probability sample, even a small one, is urgently 
needed, and that the per cent and characteristics of refusals is crucial. 
Aside from the question of taxonomically describing the behavior of 
some universe, for instance the white males in the U. 8., the data could 
be used to make comparisons among variables within the sample. To 
this end, CMT’s suggestions for merging the background variables into 
a few composites with high internal correlation and the construction 
of additive scales not anchored to zero points is a significant contribu- 
tion. In KPM the clinical tables as currently presented are almost use- 
less. An individual might find that his “normal” incidence varies ac- 
cording to the background characteristic he used to locate himself in 
the table. Composite variables would make the classifications more 
useful in both individual and group comparisons. 

CMT’s formulas which are intended to correct sample estimates for 
cluster and to estimate optimal adjustment for background character- 
istics are important ventures. Their use seems limited, as they them- 
selves have noted, in so improbable a non-probability sample as 
KPM’s. We would welcome an example working through these tech- 
niques for a cell or group of cells to give us some idea of the magnitude 
of possible bias involved. We wonder, however, if KPM’s method of 
| expanding the cases does not call for some longitudinal as well as cross- 
sectional estimates of error. Perhaps CMT had in mind using the 
cases only in the cells appropriate to the time of interview. 





822 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19; 


As a contribution to the general theory of error in non-probability 
sampling, the appendices are most rewarding. Within the limits jm. 
posed by their commission, the authors have performed a real service, 
not only in combining and evaluating criticisms of the first volume by 
KPM, but in developing and describing theoretically imaginative and 
interesting suggestions for other investigators as well. 


* * * 


Nicuouas Pastore, Queens College,! and Jacon GoLpsTEIN, 
New School for Social Research} 


E ARE impressed with the judiciousness of CMT’s appraisal both 
\\ of the KPM volume and of the comments made by the various | 
critics. We shall confine ourselves to a few constructive points. 

In line with Kinsey’s biological orientation and previous work in the 
field of biology, KPM avowedly accept the “taxonomic” approach in 
their assessment of the sexual behavior of the American population. 
Whether a biological orientation necessarily leads to such an approach 
is, of course, another question. In any case, we think that a taxonomic 
approach, which we identify with the classification of behavior (what 
is the actual behavior of Americans) without including propositions 
concerning a causal analysis of data, as inherently self-limiting. Since 
considerable theoretical discussions and empirical work on sexual be- 
havior had preceded KPM, they could have tapped this source for 
systematically formulating a point of view in advance of their own in- 
vestigation. Such a systematization would enable the investigator to 
decide and define the character of the critical variables, the items to be 
selected for measuring the key variable (in this case, sexual behavior), 
and the method of statistical analysis in so far as the selection of the 
variables is concerned. KPM write that the number of theoretic groups 
in the 12 way breakdown they presented is “nearly two billion.” A 
systematic formulation of problems or hypotheses at the outset would in 
itself considerably restrict the number of breakdowns of interest 
(quite apart from the availability of subjects in the various cells). We 
are informed that 300 items “have been explored” in each history. 
Yet only a restricted portion of the accumulated data is used (see pp. 
63 ff. of KPM for examples). Why collect such data? Again, aside from 
realistic considerations (amount of time is finite and we can’t be in- 
terested in everything), a definition of the problem in advance would 





1 “Sexual behavior of the American male: a special review of the Kinsey report,” The Journal of 
Psychology, 26 (1948), pp. 377-62. 





{HE COCHRAN-MOSTELLER-TUKEY REPORT 823 


have helped to determine the choice of interview items. Reduction in 
the number of items would have enabled KPM to explore other facets 
of behavior which would have served to make the results more usable. 
Information as to the psychological significance of an activity would 
be important. 

The taxonomic approach forces the investigators to use ad hoc 
methods in analyzing data. Perhaps the investigators try one method 
of analysis or another until something of interest comes up. We remain 
uninformed about attempted and rejected analyses from the potentially 
large pool of analyses. In Vol. I we find a discussion of the masturbation 
habits of members of professional classes. In Vol. II there is no corre- 
sponding discussion. Again, in Vol. I we find considerable discussion of 
vertical mobility as related to adolescent sexual practices; there is no 
discussion whatever on vertical mobility in Vol. II. Why the differences 
in approach? 

(1) Our point in sharply drawing a distinction between an actual 
incidence figure and an accumulative incidence figure (CMT, p. 124) 
lay in what seemed to be a confusion between the two—both in the 
minds of whose who quoted KPM and in KPM themselves. It seemed 
especially important to call attention to the distinction since the ac- 
cumulative incidence figure is a larger number than the actual incidence 
figure. We called the 43.3% figure an “artifact” because it could be 
taken to characterize the behavior of 2816 individuals involved 
whereas, in actuality, it characterizes only those 97 individuals who 
’ reached 50 years of age. We then quoted a passage from KPM (CMT, 
p. 124) which, to us, seemed to illustrate the confusion. In retrospect, 
we realize now that we may not have been entirely fair to KPM. 
Nevertheless, we feel that their unclear wording on the point has con- 
tributed to a popular misunderstanding. For another example (since 
CMT question our cited example), on p. 663, Vol. I, KPM write: “In- 
asmuch as our present data indicate that more than a third . . . of the 
white males in any population . . . have had at least some homosexual 
experience ... ” (italics ours). Other examples can be found in Vol. I 
and in Vol. II. Further, we are still uncertain as to the value of present- 
ing data in terms of “accumulative incidence figures.” Apart from 
whether the underlying assumption is met, such a figure removes us 
from the raw data themselves. We would like to know (and we may be 
expressing a personal preference) what the actual incidence figures are, 
prior to the computation of the “accumulative incidence figures.” As a 
suggestion data can be presented in the form of a scattergram (for 
certain critical variables at least). The abscissa can refer to the actual 





824 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


age of interviewee at the time of interview; the ordinate can refer to 
the reported age of first experience. 

(2) We still maintain that KPM incorrectly generalize from thei 
data (CMT, p. 148). KPM assert a generalization and cite, so it would 
seem, “parental classes 3, 4, or 6” by way of illustration. If KPM 
wished to exclude those classes which contradicted their generalization, 
they should have stated so explicitly without leaving it up to the reader 
to discover this for himself by checking through various tables. More. 
over, the reader must discover for himself that “mean frequency” is the 
index intended by KPM rather than any of the other indices. 

(3) CMT’s citation (p. 28) of KPM’s explanation of different totals 
in different tables troubles us. On page 208, Table 41 of KPM two sets 
of mutually exclusive categories are presented. We note that the sum 
of frequencies for occupational class exceeds those for educational level 
by more than 800. 

(4) With regard to the existence of a seventh outlet (CMT, p. 228) 
we were influenced by KPM’s wording—‘“the six chief sources of 
orgasm” (Vol. I, p. 193). We thought that they had fetishism and other 
perversions in mind. 

(5) CMT write, with regard to their interviews by KPM, that they 
made “rough and ready guesses” in so far as frequency figures were 
concerned. Their impression coincides with that of one of the present 
writers who was interviewed by one of the KPM team in 1948. He felt 
at that time that frequency figures, especially when they pertained to 
some practice engaged in 15 or 20 years prior to the interview, were 
within a given range, sheer guesses. Perhaps similar considerations 
apply to others who were interviewed. We wonder what the mean and 
median frequency figures in the KPM tables really signify. 


* * * 


Lewis M. Terman, Stanford University! 


WOULD now make the following changes in the quotations by CMT 

from my review of Kinsey’s Sexual Behavior in the Human Male. Each 
quotation is identified by the page where it appears in the volume by 
CMT. 

Page 69. In the fourth line from the bottom of the quotation change 
the word “reliably” to “appreciably.” 

Page 95. Leave as is to the third word in line 7. The remainder of 
this quotation should be changed to read as follows: 





1 “Kinsey’s ‘Sexual Behavior in the Human Male’: some comments and criticisms,” Psychological 
Bulletin, 45, No. 5 (1948), pp. 443-59. 


a tt» 2m aa We. 2 ah ot 2 CG 





THE COCHRAN-MOSTELLER-TUKEY REPORT 825 


“In the comparisons of both the otal population and the active 
population the Kinsey figures are higher in nearly all cases. Though 
the specific differences are not statistically significant the trend of 
difference is consistent in direction.”* (Keep footnote as is.) 

Pages 112-113. In the third line on p. 113 change the phrase “would 
have been much greater” to “would probably have been greater.” 

Page 138-139. This long quotation from my review would have been 
more fair had I added the paragraph from KPM (pp. 415-416) that 
is quoted on p. 141 of CMT. 


* * * 


Paut Watuin, Stanford University! 


HE position taken by CMT in regard to most of my criticisms of 
Ten is entirely acceptable to me. There is little, if any, disagree- 
ment between us on the issues discussed in my review. Indeed I feel 
impelled to register dissent with only one of their comments on my 
work. 

This concerns my criticism of KPM for reporting hundreds of means, 
and medians based on less than 50 cases, many on less than 10 and 
some on less than 4. CMT observe (pp. 88-89) that 

Perhaps the criticism is not totally justified. There seems to be no general 
reason for throwing away a sample of size 20 or 30 just because it is small, 
especially when it may be the only data on the subject. In many fields of 
endeavor a sample of 30 is regarded as quite substantial (testing A-bombs 
for example). It does seem reasonable, however, to choose a minimum 
sample size for a cell, and report smaller numbers only with some danger 
signal attached to warn the reader of the instability of the entry. 


In view of the difficulty of securing subjects for the KPM kind of 
research I would agree that samples of size 20 or 30 should not neces- 
sarily be discarded because of their smallness. However, given the great 
range of variability of human behavior—particularly in as large and 
complex a society as the United States—I question the value of using 
non-random samples of this magnitude—even if accompanied by a 
danger signal—as a basis for deriving national statistics on various 
sexual practices of American males. I likewise am dubious about the 
accuracy of means or medians calculated from 10 or fewer cases as 
estimates of the frequency of different types of sexual behavior within 
what CMT call the “sampled population” as distinguished from the 
“target population.” 





1*An Appraisal of Some Methodological Aspects of the Kinsey Report,” American Sociological 
Review, Vol. 15, No. 2 (1949), pp. 197-210. 





826 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


In testing A-bombs considerable confidence might be attached to 
results achieved even with a sample of 1, since bombs are constructed 
under controlled conditions intended to maximize the probability of 
homogeneity in their behavior. Unfortunately for the problems it poses 
for social scientists, the behavior of human beings entails many degrees 
of freedom and is thus likely to be much more heterogeneous than that 
of A-bombs. 

Other than the above, there are two very minor points which I should 
like to comment on briefly. 

(1) In discussing the KPM comparison of the original and retake 
histories of 162 subjects I said that inconsistency between the two 
sets of histories “would suggest that the data being sought cannot be 
accurately reported by the subjects. The evidence of witnesses whose 
testimony varies from one time to another is of questionable accuracy.” 
CMT note (p. 96) that if the latter statement 

. .. is taken literally it implies that all readings of voltmeters, balances and 
other physical instruments are of “questionable accuracy.” If so, this would 
not be a serious criticism. 
It hardly need be said that the differences alluded to in my statement 
were significant differences rather than differences of any degree. | 
pointed out in a subsequent paragraph that “... the data indicate 
test-retest reliability is high on incidence items but far below acceptable 
standards on frequency items.” 

(2) In discussing the possible bias in the KPM sample due to re- 
fusals, I stated “That the number of non-volunteers was considerable 
is indicated by the fact that ‘Perhaps 50,000 persons have heard about 
the research through lectures and perhaps half of the histories now in 
hand [i.e. 6000 out of 12,000] have come in consequence of such con- 
tacts’ (KPM, p. 38). The authors’ comment on another study is ap- 
plicable to their own. They say it is “... open to the very severe 
criticism that it involved only a highly selected sample of the total 
population. What is more serious, one is left guessing as to the histories 
of the 51 per cent that failed to answer the questionnaire’ (KPM, 
p. 619).” 

CMT assumed I was implying here that KPM had a refusal rate of 
88% (44,000 non-volunteers out of 50,000 asked to volunteer for inter- 
views). Consequently, it is CMT’s calculation, and not my more gen- 
eral inference that “...the number of non-volunteers was consider- 
able... ,” which KPM have characterized (p. 61) as 

. . . meaningless, since many of the 50,000 were not approached in any way 


to give histories. 
* * * 





{HE COCHRAN-MOSTELLER-TUKEY REPORT 827 


W. AuueN Wats, University of Chicago! 


HE CMT report is excellent. In thoroughness, competence, ob- 

jectivity, wisdom, originality, and lucidity it leaves nothing to be 
desired. It is a credit not only to its authors but to the Association that 
sponsored it and the officers who selected the authors. 

The purpose of this comment, however, is not to review CMT in 
general, but to consider any disagreements between CMT and my own 
review of KPM, which they scrutinize word by word—some words 
several times, in fact. The cynic will sense from my opening paragraph 
that in the CMT report I have found not substantial criticism of my 
review but repeated agreement and approval. Nevertheless, by check- 
ing every listing of my name in the index, I have found half a dozen 
direct or indirect demurrers to points of mine. They, and my comments 
on them, are all so small and dull that the reader would do well to skip 
directly from here to (or perhaps beyond) my last paragraph. 

(1) CMT point out (p. 86) that it would be difficult to make mean- 
ingful tests of the significance of such differences as those between the 
KPM interviewers. They are right, though it is an exaggeration to say 
that every interviewer would have had to visit every collecting site. 
While my remark does refer to the first Kinsey report, it occurs, as the 
first sentence they quote shows, in connection with suggestions for fu- 
ture work. Furthermore, it is not as clear to me as it seems to be to 
CMT that KPM in pages 133-43 are not attempting to assess the 
statistical significance of the differences whose smallness they em- 
phasize. The CMT position could be supported if one were to give 
KPM credit for reasonably precise and careful use of statistical terms; 
for example, they call the differences “not material,” but do not call 
them “not significant.” They do, however, compare differences in 
means with the variability of individual observations, they do refer to 
“the independent interviewing of three persons drawing their samples 
very largely at random... ,” they do imply a sign test between two 
interviewers, and they do draw at least one clear conclusion from their 
sample of three interviewers to “any other group of interviewers” 
(all on p. 135). Altogether, in this and similar comparisons, they 
create in my mind some atmosphere of significance-testing and some 
impression that differences, by and large, either are not significant or 
can be explained away by other variables. 

(2) CMT are right in believing (p. 107) that I make a distinction in 
meaning between “fail to provide confidence” and “destroy confidence” 
or “contradict.” 





1 “Statistics ef the Kinsey report,” Journal of the American Statistical A tation, 44 (1949), pp. 
463-84, 








828 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


(3) I cannot agree with CMT’s contention (p. 109) that KPM are 
“correct, informative and helpful” when they say that “a few high- 
rating individuals affect the means more than [many] low-rating in- 
dividuals”; I would have to know how high-rating are the few in- 
dividuals and how numerous are the low-rating individuals in the com- 
parison. It is likely, of course, that KPM were groping for the correct 
idea which CMT state, that in distributions skewed to the right, the 
mean exceeds the median; but my criticism pertained to the confusion 
of their presentation. 

(4) I agree with CMT (p. 110) that necessary and sufficient con- 
ditions for equality of the mean and median need not be discussed 
by KPM. My illustration was simply one of several to indicate the 
presence in the KPM report of many a “matter like a misspelled word, 
which an author would change if his attention were called to it.” To see 
how the author might want to change this one, I suggested two possible 
interpretations that occurred to me, but indicated dissatisfaction with 
both. 

(5) My advice to make comparisons among the component groups of 
a whole rather than between a component and the whole related to 
planning methodological checks in future work. One of the comparisons 
to which I ailuded, that between age groups, may have been inap- 
propriate, since this comparison was not made for methodological pur- 
poses (though the advice may be appropriate to it) ; and this ill-chosen 
allusion seems to have diverted CMT’s attention from the context of 
the passage they quote (p. 111). The remarks they make—they avoid 
taking a position—are, however, valid if taken out of context. 

(6) That CMT analyze so carefully (pp. 111-12) my references to 
sequential procedures suggests that they credited me with having 
something deeper in mind than I did. In referring to the problem of 
determining sample size, I stated that it may be impractical in the 
case of averages to specify the required accuracy as a fixed percentage 
of the true figure. The point here is that frequently it is technically 
impossible to specify a fixed sample size that, with prescribed con- 
fidence, meets an accuracy requirement so specified. This kind of 
technical difficulty is sometimes surmountable by sequential methods. 
Stein’s double-sampling procedure for which CMT give the biblio- 
graphical details is the one I assumed my mention of double-sampling 
would bring to the minds of statisticians; his sequential procedure, 
however, was then very new and virtually unknown, so I cited it more 
specifically. CMT point out that Stein’s sequential procedure was not 
available when KPM’s work was done; I mentioned it in discussing the 








am & & SS Sf 


r in- 


om- 
rect 

the 
sion 


con- 
ssed 

the 
ord, 
) See 
ible 
vith 


THE COCHRAN-MOSTELLER-TUKEY REPORT 829 


determination of sample sizes for future work (pp. 467-8 of my review). 

These six points are trivial, but they are the only ones I can find 
where CMT seem to take even this much issue with the points made in 
my review. As far as I can see, the differences between us are negligible. 

Turning to a broader issue, CMT apply one criterion to KPM’s 
work that merits further consideration. They compare KPM’s statisti- 
cal methods with those of other studies of sex, rather than other studies 
involving surveys. Other studies of sex are not numerous and, judging 
by the descriptions by KPM (pp. 23-34) and by William O. Jenkins 
(CMT, Appendix B), not good. More important, those few that are in 
any way comparable with KPM’s were published at least a decade 
before KPM’s, and the intervening decade was one of extraordinarily 
rapid progress in techniques of sampling, and of analyzing samples 
from, human populations. Thus, the comparison with other sex studies 
has a little the character of proclaiming a champion on the basis of 
knocking down a straw man. On the other hand, it is unrealistic to 
expect workers in any field to keep abreast of the latest methodological 
developments, however vital to their own research. Furthermore, 
Kinsey’s interviewing started a decade before the KPM report was 
published, and many of the methods for whose absence the KPM re- 
port is criticized had not been developed nearly so far then as now. 
Perhaps the best statistical methods used previously in the particular 
field of application provide an appropriate lower bound on what it is 
reasonable to expect, and the best methods used previously in any 
field an appropriate upper bound. 


* * * 


Wixuram G. Cocuran, The Johns Hopkins University 
FREDERICK Moste.ueR, Harvard University 
Joun W. Tukey, Princeton University 


E APPRECIATE the opportunity given us by the Editor of the 

Journal to add to this discussion. But we feel that our own views 
have already been stated at considerable length. We wish to thank the 
reviewers for their thoughtful discussions and useful amplifications. 
Although we are not in complete agreement with all the points made, 
our remaining differences hinge mostly on questions of judgment, 
emphasis, or frame of reference. 





THE EFFICIENCY OF DOUBLE SAMPLING 
FOR ATTRIBUTES 


H. C. HAMAKER AND R. vaAN StTRIK 


Philips Research Laboratories, N. V. Philips’ Gloetlampenfabrieken, 
Eindhoven, Netherlands 


§l. INTRODUCTION 


N A random walk diagram where we plot the total number inspected 
as abscissa and the number of defectives observed as ordinate, a 
double sampling plan is represented by two screens S, and Sz as indi- 
cated in Fig. 1. When after inspection of a first sample of m items the 
random walk ends in section A; of screen S, the lot is accepted, and 














S; S2 
Re) 
wy 
> 
7) 
Ww 
8 Ry R2 
rH 
> 
o 
~~ a 
° Pras 
3 a 
€ Dy -~ A 
§ » 2 
= - 
Fe 
a 
a 
ae Ay 
< VY a %, ey 
ny Mp 


——Number of items inspected 
Fic. 1. Random walk diagram for a double sampling plan. 


when it ends in R;, the lot is rejected. But whenever we arrive some- 
where in the opening D, the case is dubious; we proceed to take a sec- 
ond sample of size nz and pass a final judgment according to the sec- 
tions A, and R, of screen S:. Evidently a double sampling plan re- 
quires 5 parameters for its full specification; two sample sizes and 
three “decision numbers.” 


830 


pou 


| pial 


ope 
by ' 


cho 
lem 


(6 





ie, 8 
indi- 
} the 
and 


DOUBLE SAMPLING FOR ATTRIBUTES 831 


If we wish to replace a single sampling plan by a double sampling 


| plan we shall as a rule require both plans to possess nearly the same 


operating characteristics (OC). Since a single sampling plan is fixed 
by two parameters and the double plan by 5 there will be a considerable 
variety of double sampling plans satisfying this requirement, and the 
choice of the most suitable one among these becomes a complex prob- 
lem. 

In a way this problem was solved by the Statistical Research Group 
(6] and in the subsequent Military Standard 105A [4] and since the 

















Ro) 
g 
qq 
® 
W 
r) 
” R; Ro 
s 
~ 
1S) 
£ 
B Po 
S D; Pe 
3 a 
g Lion ® A> 
z we 
— Ay 
Pa 
a 
7 
Pn 
O~ ‘4 « — JS 
I N2 


—— Number of items inspected 


Fig. 2. Random walk diagram for an intuitively 
inefficient double sampling plan. 


tables contained in this standard have been widely accepted we may 
conclude that the solution is a satisfactory one. 

Yet it may be doubted whether it is in every respect the best solu- 
tion conceivable, as the following argument will show. Apart from 
random deviations a random walk created by the inspection of items 
taken from a homogeneous lot will move along a straight line through 
the origin. Hence if we draw a straight line from the origin to the di- 
viding point P: in the second screen (Fig. 1) this line should preferably 
pass somewhere through the center of the opening D,. For with a set 
of screens as depicted in Fig. 2 the judgments based on the first and 





832 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 194 


the total sample are not properly balanced. We accept some lots by 
the first sample which would have a high chance of being rejected 
should we proceed to take a second sample; and we will likewise be 
led to consider some lots as doubtful by the first sample which are very 
likely to be rejected after inspection of a second sample. Intuitively we 
conclude that double plans with a set up as in Fig. 2 must be ineff. 
cient. 

Generally we must therefore require that in a double sampling plan 


1 m1 1 
(a + 34) < ne (cs + 3), (1a) 


ny . ‘ 

(ce + $) > rm (cz + 3), (1b) 
where 7 and m2 are the sample sizes and ¢; ,c2, and c; the three decision 
numbers. Throughout we shall use the lower decision numbers, s0 
defined that a lot is accepted when the number of defectives is c; or less 
in the first, or c; or less in the total sample; and a second sample is 
taken when in the first sample we have observed a number of defec- 
tives greater than c; but less than or equal to ce. 

Since we accept at c; and reject at c;+1 defectives we have assumed 
the dividing point in the screen S. to be at (cs;+}4), etc. 

In the Mil. Std. 105A c2=c; and condition (1b) is consequently al- 
ways satisfied. Since in these tables nz=2n, condition (1a) becomes 


(ec: + 4) < (es + 4) /3 (2) 


and many of the double sampling plans incorporated in the Mil. Std. 
105A do not satisfy this last criterion. This suggests that it is perhaps 
possible to find other double sampling plans with nearly the same 0.C. 
but a better efficiency. 

The original aim of the researches described in this paper was to 
determine how far this intuitive argument is correct and whether 
the improvements, if any, would be worth practical consideration. In 
the course of our work it appeared, however, that there are many 
other aspects of double sampling, such as the use of truncation in the 
second sample and the choice of parameters which have to be taken 
into account to complete the picture. Hence all these factors will be 
discussed in due course. 

In §§2, 3, and 4 we begin by developing a theory of double sampling 
efficiency which already has been’ described briefly in earlier papers 
[2, 3] and will form the basis of all our later arguments. 





DOUBLE SAMPLING FOR ATTRIBUTES 833 


§2. “EQUIVALENT” SAMPLING PLANS 


It will be convenient to use the following short-hand notation for 
double sampling plans 


D(neo/n1; C1, C2, C3), (3) 


where n2/m denotes the ratio of the sample sizes, and ¢, ¢, and cs; the 
three decision criteria. For example the symbol 


D(2; 2, 6, 10) (4) 


designates all double sampling plans where the second sample is twice 
as large as the first; and by which a lot is accepted, when the first 


> 


=* 
=) 





\" 


\ 





Probability of 
acceptance 


1 2 3456 8D 15 20% 


—— logarithm of percentage defective 












































Fic. 3. The operating characteristic plotted against log p. A change of 
sample size produces a horizontal shift but no other changes. 


sample contains 2 or less defectives, or the total sample 10 or less de- 
fectives, while the lot is rejected when the first sample contains more 
than 6, or the total sample more than 10 defectives. 

By the symbol (3) or (4) a double sampling plan is fixed apart from 
the size n, of the first sample. As long as we assume Poisson condi- 
tions—which hold true in most industrial applications—the probabil- 
ity of acceptance for a plan described by (3), will be a function of mp, 
where p is the fraction defective in the lot. The most convenient way 
to use this fact is by plotting the OC, not against p as is usually done, 
but against log p as done in Fig. 3. A change in n, will then correspond 
to a shift of the characteristic parallel to itself. Hence, if we wish to com- 
pare different sampling plans, we can always start by adjusting the 





834 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


sample sizes so that the operating characteristics have a specific point 
in common, for example the point of control, or indifference quality, 
pso defined by 


P(pso) = 0.5, (5) 


where P is the probability of acceptance. 

If we wish to investigate the efficiency of double as compared to 
single sampling, the first step is to find two plans which possess OC’s 
as nearly alike as possible; for otherwise the inspection performances 
achieved by the two plans are different and the numbers of observa- 
tions required are not comparable. To this end we start by changing 
the sample sizes so that the two operating characteristics possess the 
same point of control. In Fig. 4, for example, the OC for the plan 


1-0 








> 
S,) 


Probability of 
acceptance 























f 2 3 45 10 20 % 
—— Logarithm of percentage defective 





. . T we 





Fig. 4. The OC for the double sampling plan D(2; 5, 10, 10) compared, after 
an appropriate adjustment of the sample size, with those of the single sampling 
plans for co=5, 6, and 5.23. 


D (2; 5, 10, 10) is compared with those for single plans with co=5 and 
co=6, after this adjustment has been performed; both give a reason- 
ably close fit and at first sight it is difficult to judge which of the two is 
best. 
In order to achieve some precision in our argument it is therefore 
necessary to effect the adjustment by a more accurate mathematical 
treatment. To this end we have adopted the principle that two operat- 
ing characteristics may be considered as equivalent when they not only 
possess the same point of control ps0, but also the same slope at that 
point. It will be convenient to use the slope of the curve when p is 
plotted on a logarithmic scale, because this slope is, under Poisson 





DOUBLE SAMPLING FOR ATTRIBUTES 835 


conditions, independent of the absolute value of the sample sizes. 
Hence we have defined as a second parameter the relative slope 


ns (~~) 2/ =) 6 
7 d\n p/ v0 ae » dp why ) 


where P is the probability of acceptance, and we will call two OC’s 
“equivalent” when they possess the same values of h and py." 

Even so a complete adjustment cannot be attained, because the 
acceptance criteria, and h in consequence, vary in discrete steps. But 
this difficulty can be overcome by introducing broken values of the 
acceptance number co of the single sampling plan. If, while drawing 
asample, we determine at the same time by a lottery with probabilities 
0.77 for a 5 and 0.23 for a 6, whether the acceptance number shall be 
5or 6, such a plan may be defined as belonging to an acceptance number 
5.23; and its operating characteristic is obtained by linear interpola- 
tion between the curves for co=5 and co=6. Such a method is not 
recommended for actual practice, but in a theoretical argument it 
enables us to find a single sampling plan with exactly the same slope 
has a given double sampling plan, and consequently with a very close 
approach in its operating characteristic, as Fig. 4 shows. We might 
similarly introduce broken values of the sample size mo in order to 
achieve perfect adjustment in the point of control pso. Usually, how- 
ever, sample sizes are so large that the error committed in rounding 7 
to an integer value may be neglected. The introduction of broken values 
of co is more essential. 

The adjustment is greatly facilitated by the fact that the parameters 
po and h are, for single sampling plans, connected to the sample size 
no and the acceptance number cp by two very simple relations [2, 3], 


NoPso = Co + 0.67, (7) 


~~ h? = nopso + 0.06 = co + 0.73. (8) 


“ 


The method of comparison is now as follows. For a given double sam- 
pling plan we find the values of h and pso by numerical computation, 
and then obtain nm and co from (7) and (8); these data specify an 
equivalent single sampling plan. Details of the numerical computation 
are given in an Appendix. 





1 In earlier papers [2, 3] the notation po, he has been used instead of pse, h. The advantage of using 
Pw is obvious; with pos or pis we can then describe other points on the OC. It is not necessary to indicate 
the slope at the same time as hse because we will consider the slope only at the point pus. 





836 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10; 


Equivalence, as here defined, does not mean, of course, that the 
OC’s are completely coincident; near the top or the bottom of the 
curves slight differences remain. But equivalence does mean that the 
OC’s are on the whole so close to each other that the differences have 
no appreciable influence in practical applications; for practical purposes 
the OC’s may be considered as coincident. 


$3. AN ALTERNATIVE DEFINITION OF EQUIVALENCE 


Commonly OC’s are not specified by the parameters pso and h, 
but by two fractions defective, say, pos and pio corresponding to ac- 
ceptance probabilities of 0.95 and 0.10 respectively. 

For single sampling plans and under Poisson conditions the ratio 
Pi0/Pos is, like the slope h, a function of the acceptance number ¢, 
alone. Hence an “equivalent” single sampling plan might alternatively 


TABLE 1 
COMPARISON OF TWO DEFINITIONS OF “EQUIVALENCE” 








Single Sampling Plan Adjusted 
by Means of 





Pso, h Pio, Pos 


Double Sampling Plan 





No no’ 





1, 1), m1. =90 139 ‘ 132 ‘ 1.053 
4, 4), mn. =90 238 : 231 ‘ 1.030 


3, 13), nm, =150 196 : 203 R 0.966 
, 9), m1 =90 202 : 194 ‘ 1.041 
, 8), m=75 192 . 185 : 1.038 




















be defined as the plan with the same value of pio/p95 as the double 
plan considered and with a sample size no so adjusted that the two 
OC’s coincide at p95 and pro. 

For a set of 5 different double sampling plans the two definitions of 
equivalence considered above are compared in Table 1. The last set 
of three are double sampling plans with approximately the same OC’s 
but with different decision numbers. The plan D(2; 5; 13, 13) corre- 
sponds to a random walk diagram asin Fig. 2, that is, to a presumably 
inefficient choice of the acceptance numbers; the plan D(2; 2, 9, 9) still 
adheres to the c.=c; principle but the acceptance numbers have been 
better chosen, while for the last plan in Table 1, cz cs. 





DOUBLE SAMPLING FOR ATTRIBUTES 837 


The differences between the two definitions of equivalence appear 
to be slight; they amount to not more than 5 per cent of the sample 
size. 

The OC’s of the plans occurring in one row in Table 1 lie very close 
together, so close in fact that it would be difficult to draw them clearly 
separate in one figure. For the sampling plan D (2; 2, 9, 9) the differ- 
ences are shown in Table 2; for the other plans they were of the same 
order of magnitude. As might be expected the plan adjusted by means 
of pso and h lies closer to the double sampling plan in the middle range 
of P values but further away at large or at small values of P. 


TABLE 2 
ACCEPTANCE PROBABILITIES FOR D (2; 2, 9,9) AND THE 
EQUIVALENT SINGLE SAMPLING PLANS ADJUSTED BY 
poo, h, AND BY pio, po RESPECTIVELY 








Plan 





Percentage 
Defective D (2; 2, 9, 9) co=7 .23 
nm, =90 no = 202 











0.9996 0.9992 0.9993 
-9636 .9555 -9625 
. 7688 - 7623 - 7903 
.4734 -4733 -5155 
.2431 . 2340 .2711 
.1159 .0963 .1196 
.0551 .0343 .0458 
.0266 .0109 -0157 


% 


1 
2 
3 
4 
5 
6 
7 
8 














On the whole both definitions of equivalence seem equally accept- 
able. In the following we have used the definition of §2 because it is 
easier in numerical computations and has the advantage of formula’s 
(7) and (8). If another definition is adopted this may cause systematic 
changes of 3 to 5 per cent in the efficiencies as defined in the next sec- 
tion. Such changes will not alter the main conclusions at which we 
shall arrive. 

$4. A THEORY OF EFFICIENCY 


By means of pso and A we can now find an equivalent single sampling 
plan in a reasonably precise manner and we proceed to compare the 
average number of observations required by equivalent sampling plans. 





838 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


The ratio of the average sample size of a double sampling plan to the 
sample size of its equivalent single sampling plan will be called in. 
verse efficiency. The inverse efficiency measures the amount of inspec. 
tion under double sampling in terms of the amount of inspection re. 
quired by a single sampling plan with the same inspection performance, 
In keeping with general practice we assume that truncation may be 
permitted in the second sample of a double sampling plan but is not 
allowed in the first sample or in single sampling. In this section we 
investigate the case where no truncation is used; the effect of truncation 
is discussed in §§5 and 6. 

The average sample size of a double sampling plan, and the inverse 
efficiency in consequence, vary with the fraction defective in the lot 
inspected. Actually we get more convenient curves if we plot the in- 
verse efficiency not against p but against the corresponding probability 
of acceptance P, using a normal probability scale. The curves thus 
obtained will be termed efficiency characteristics; these we shall use to 
study the efficiency of various double sampling plans. 

The double plan D(2; 5, 10, 10) does not satisfy the criterion (2) 
of §1 and must by the arguments of that section intuitively be classified 
as inefficient. This view is strikingly corroborated by its efficiency 
characteristic as shown in Fig. 5. Even for good lots with an acceptance 
probability above 0.99 the inverse efficiency is hardly less than unity 
and the average amount of inspection is for the double sampling plan 
almost the same as for the equivalent single sampling plan. For lots of 
poor quality the double sampling plan is much worse. 

If we now change over from-the plan D(2; 5, 10, 10) to the plan 
D(2; 2, 10, 10) we must by the argument of §1 expect an improvement 
in efficiency and this too is brought out by Fig. 5. 

By the efficiency characteristics in Fig. 5, two double sampling plans 
are compared each with its own equivalent single sampling plan. Can 
these curves also be interpreted as portraying the efficiency of the two 
double sampling plans with respect to one another? We believe they 
can for the following reason: The two double plans under discussion 
possess different slopes of their operating characteristics, and conse- 
quently the average sample number curves for these plans are not 
comparable. Generally, however, we know that the steeper an OC the 
larger the number of observations required, and this holds true for 
single, double, and sequential sampling alike. Can we measure the 
influence of the slope and correct for it? 

Numerical computation yields for the plan D(2; 2, 10, 10) a relative 
slope h=2.427, and for the plan D(2; 5, 10, 10) h=1.949. Hence by 





pOUBLE SAMPLING FOR ATTRIBUTES 839 


equation (8) the ratio of the equivalent single sample sizes turns out 


to be 
No 
= 1.56. 


No’ 


To achieve the greater slope of the OC of the plan D(2; 2, 10, 10) 
would require under single sampling 1.56 times as many observations. 
If we assume that this factor holds for double sampling as well, we are 
in a position to eliminate the difference in slope and thereby to render 


a J\\ 
Region of / \ 


practical 
interest 























mg 
WN 


(2;5, 


)| o> 
F. 








> 
2 
& 
a) 
s| 
@ 
aw 1-0 
ra 
® 

> 
S 


2:2, 90,10 





al 
cr 






































99.9 99 95 8 50 200 5 1 01% 
——Probability of acceptance 
Fia. 5. Efficiency characteristics for the double plans D(2; 5, 10, 10) and D(2; 


2, 10, 10). By reducing the acceptance number ¢; from 5 to 2a striking improve- 
ment is achieved. 


average sample number curves for the two double plans more directly 
comparable. This is exactly what is achieved by the inverse efficiency 
as introduced above. 

We may put the same argument in a somewhat different way. 
Single sampling plans with different acceptance numbers have different 





0 ‘- ++ & ‘gmt uoqe (OT ‘OT ‘9 ‘Za 
suid jo 408 OY} IO} SONSIIOJOVIBYS ADUOTOIYA “9 “OI 


aouD}da52D 40 AjI]1IGDG0ij<———— 
"e pus ‘F ‘g ‘9 ‘OT=* qi (OI 9 ‘Z ‘Z)a , 
sus[d 94} 10} soNstiejOBIBYyO ADUDIDIQY “2 “DI % bo | S C& OS O8 % 6 6°66 





aounydar20 40 A}/11GDq04y <—— 


%bkO + §F OC OS C8 SG 66 666 








2 
i) 


Za 




















Ss 








UBIDI{J9 ISJAAUT 
2 
= 
AQUdIIIJJa ASJOAUT 


4 
AE 











z &) 




















ms 
- 


pSasajul 
102139040 
jo voiGay }Sasajul 
1D3/,9D4d 
40 vo/bay 





3 
oo 
& 
—Q 
; 
o>) 
n 
a 
< 
Zz 
=} 
Pp 
3° 
a] 
Zz 
° 
a 
< 
5 
io) 
B 
< 
P| 
oO 
= 
nN 
= 
< 
& 
77) 
a 
< 
o 
_ 
= 
= 
a 
< 

































































840 








ee eee 


Ee 


-, O. 


D(2; c1, 10, 10) when c; =5, 4, - - 


DOUBLE SAMPLING FOR ATTRIBUTES 841 


relative slopes, and from the point of view of efficiency they can not 
be compared. It is therefore not unreasonable to accept single sampling 
plans as a general standard of reference which has by definition unit 
efficiency; we can then refer all other sampling plans to this common 
standard. 

From this point of view Fig. 5 can be accepted as giving a reasonable 
picture of the relative merits of the two double sampling plans with 
respect to one another, and efficiency characteristics can be used to 
judge the merits of any double sampling plan. 

Fig. 5, however, may be misleading in another respect. In practical 
situations the majority of the lots submitted for inspection is accepted 
and only a small fraction of the lots is rejected. This means that most 
of the lots have a high probability of acceptance, say, 95 per cent or 
thereabout at least. Hence, in practice we will operate mainly on the 
left-hand side of Fig. 5 and the striking peaks of the efficiency curves 
are in reality of little interest because lots falling in that region are 
rare. Thus, without express warning Fig. 5 may easily lead to a biased 
interpretation. To avoid this, the part of Fig. 5 with probabilities of ac- 
ceptance greater than 0.80 has been specially indicated as the region 
of practical interest. In comparing different sampling plans we should 
pay attention to this part of the figure mainly. As a further application 
of efficiency characteristics we consider in Fig. 6 the set of plans 
D(2; 5, 10, 10), D(2; 4, 10, 10), D(2; 3, 10, 10), etc. We see at once 
that in the region of practical interest the plan D(2; 2, 10, 10) is the 
best. 

Next, fixing c; at 2, we study the further improvement that can be 
obtained if we abandon the c.=c; principle (Fig. 7). As expected a 
change from D(2; 2, 10, 10) to D(2; 2, 5, 10) gives an improved efficiency 
mainly on the side of poor lots; in the region of practical interest the 
effect is almost nil and both plans are equally satisfactory. If we re- 
duce cz still further the opening in the first screen of the random walk 
diagram becomes too narrow and the efficiency is impaired. 


§5. TRUNCATION 


The efficiency characteristics so far considered were based on the 
assumption that all the samples are completely inspected. To reduce 
the number of observations truncation of the second sample is some- 
times recommended; that is, inspection of the second sample is stopped 
as soon as the total number of defectives observed exceeds the final ac- 
ceptance number cs. Evidently truncation will not alter the OC but 
will improve the efficiency. This is again nicely demonstrated by the 
efficiency characteristics. 





842 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10985 


To study the influence of truncation we have used the three double 
sampling plans already considered in Table 1, viz., D(2; 5, 18, 13), 
D(2; 2, 9, 9) and D(2; 1, 4, 8). These have nearly the same relative 
slopes, h=2.21, 2.25 and 2.20 respectively, and if the sample sizes are 
properly adjusted, as in Table 1, they have nearly identical OC’s. 

The random walk diagrams are given in Fig. 8. The plan D(2; 5, 13, 
13) is of the type of Fig. 2, and is consequently not really efficient. 
With a plan of this type lots accepted as doubtful by the first sample 
have a high chance of being rejected after the second sample, and we 
may consequently expect truncation to produce a considerable im- 
provement. With the plan D(2; 2, 9, 9) the improvement will pre- 
sumably be less, though still pronounced because the upper half of the 
opening in the first screen is still too large. Finally, with the plan D(2; 
1, 4, 8) we should expect that truncation will have only a small influ- 
ence. 

The actual efficiency characteristics in Fig. 9 are in complete agree- 
ment with these predictions. The efficiencies for the truncated plans 
were computed from the formulas given in “Sampling Inspection” 
[6, p. 209]. 

It will be noted that in the region of practical interest the effect of 
truncation is relatively unimportant. In fact, truncation is only per- 
mitted in the second sample and a second sample occurs frequently 
only with relatively poor lots. In actual practice second samples are 
rare and consequently the reduction in the amount of inspection result- 
ing from truncation is small. 

Whether we use truncation or not the double plan D(2; 2, 9, 9) is 
always considerably better than the plan D(2; 5, 13, 13); the latter 
forms part of Mil. Std. 105A and Fig. 9 illustrates the improvement 
that can be achieved by a better choice of acceptance numbers. 


§6. TRUNCATION AND THE CHOICE OF DOUBLE SAMPLING PLANS 


Dodge and Romig’s original sampling tables [1] contain double 
sampling plans for which c2.=c; and since then all double sampling plans 
proposed for practical use have been of the same type. Dodge and Ro- 
mig did not mention that a specific choice was involved and so far 
little or no attention has been paid to this problem. It has been dis- 
cussed in a report by Stein and Shaw [7] which is, however, not readily 
accessible and is not cited in the main textbooks on the subject. Hence 
we shall devote some space to it here. 

The choice ce=c; can be justified as simplifying the specification 
of a double sampling plan, while the loss in efficiency occurs in a region 





DOUBLE SAMPLING FOR ATTRIBUTES 843 


D(2;5,13,13) ; D(231,4,8) 
h=22! h=2.20 


Gq 
G 


Number of 
defectives observed 





a] 


rrvrperrerve Tenet 
> 
So 
TTTTT TTT TPT TT Ty Yt 
= 
So 


nn 











——_ 


Pa 
OO —— ee a teat manne 
y ng my 2 % "2 
——— Number of items inspected 





° 
i) 





Fic. 8. Random walk diagram for three double plans with nearly the same 
slope h. The efficiency of these plans without and with truncation is shown in 
Fig. 9. 














Region of 
practical 
interest 








= 
S,) 




















o 
c 
© 
VY 
~ 
~ 
w 
@ 
a) 
— 
S 1.0 
rer 


S 
On 


I 0(2;5,13,13) 
I D(2;2,9,9) 
I D(2;1,4,8) 
tr = truncated 
| | 
999 99 95 80 50 20 5 1 01% 


——- Probability of acceptance 





























Fia. 9. The effect of truncation on efficiency. Efficiency characteristics for the 
plan D(2; 5, 13, 13), D(2; 2, 9, 9) and D(2; 1, 4, 8) both without and with trunca- 
tion. 





844 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


of small practical importance (Fig. 9). On the other hand if we take 
C.=c; and then recommend truncation we simplify on one hand and com- 
plicate matters on the other; and it may well be questioned whether 
this is correct. In fact we see from Fig. 9 that the plan D(2; 2, 9, 9) 
with truncation has nearly the same efficiency characteristic as D(2; 1, 
4, 8) without truncation. Which of these two plans is to be preferred in 
practice? 

In the Mil. Std. 105A the acceptance and rejection numbers are 
given in full; the entries corresponding to the two sampling plans just 
mentioned would read for D(2; 2, 9, 9) 


Accept Reject 


First sample 2 10 
Second sample 9 10 


and for D(2; 1, 4, 8) 
Accept Reject 
First sample 1 5 
Second sample 8 9 


the advantage of Css being that the rejection numbers are the same 
in both samples. We do not believe, however, that this is a real practical 
advantage. As a rule the operator inspects a sample setting defectives 
apart; when inspection is completed he counts the number of defectives 
and then consults the table as to the decision to be taken. He will 
probably have to consult the table after the second as well as after the 
first sample, and if so there is no specific advantage in having the same 
rejection numbers in both cases. 

With truncation it is another matter. Truncation is only permitted 
in second samples and these occur infrequently. The routine will be to 
complete the inspection of a sample and count the number of defectives 
afterwards. But with truncation the inspector will suddenly have to 
keep an eye on the number of defectives whenever he has to inspect a 
second sample. Hence truncation necessitates irregular and infrequent 
breaks in the normal inspection routine, which will be inconvenient 
in practice. 

Hence we are of the opinion that if the acceptance and rejection num- 
bers are given as fully as in the Mil. Std. 105A. sampling plans with 
C2cz will be more satisfactory in practice than plans with c2=c; com- 
bined with truncation. 

It may be added that we do not know how far truncation is really 
applied. It is recommended in most textbooks, but in Mil. Std. 105A, 
which is primarily meant for immediate practical use, truncation is not 
mentioned at all. We are inclined to conclude from this that trunca- 





DOUBLE SAMPLING FOR ATTRIBUTES 
1.5 





Region of 
practical 
interest 

















= 
S 

















S 
O 





I D(1;17,40,40) 
I D(23;11,40,40) 
I D(33;8,40,40) 
on 
| 
999 99 95 8 50 20 5 1 01% 
——-© Probability of acceptance 


Fia. 10. The effect of the ratio of the sample sizes, when cz =Cs. 


—— Inverse efficiency 





























1.0 
Region of a 


; pig 
practical Z 
interest 





WY 


A, 


fh 


vf 














—— Inverse efficiency 
S 
On 


I D(1;17,22,40) 
I D(2;11,15,40) 
I D(3;8,1,40) 
IZ D(4;6,9,40) 


Ti 


99.9 99 95 80 50 20 5 1 01% 
—— Probability of acceptance 





























Fic. 11. The effect of the ratio of the sample sizes when C2 cz. 





846 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


tion has been found inconvenient; perhaps for the reasons we have 
given above. Some practical information on this point would be de- 
sirable. 

In conclusion it may be emphasized once more that in the region of 
practical interest the gain from either truncation or cz c3 is small any- 
how. 


§7. THE RATIO OF THE SAMPLE SIZES 


In the original double sampling tables of Dodge and Romig [1] the 
ratio of the sample sizes, n2/m, varied somewhat but nearly all later 
tables have used a constant ratio, ne/n;=2. By the method of the fore- 
going sections we can now easily settle in how far this is a satisfactory 
choice. | 

In Figs. 10 and 11 we give the efficiency characteristics for sampling 
plans with ne/n;=1, 2, 3, and 4. In computing these curves large ac- 
ceptance numbers have been used so that c; and c, can be fairly ac- 
curately adjusted to give optimum efficiency. We see that for good lots 
the greater the ratio n2/m the better the efficiency. At a probability of 
acceptance of 95 per cent the value of ne/n, has comparatively little 
influence and to the right of this point a high ratio n2/n, leads to de- 
creased efficiency. The differences in this regior. are larger for c2=c; 
than for coc; plans (bearing in mind a difference in the scales of Figs. 
10 and 11). All this is reasonable enough. 

On the whole n2/n;=2 seems a satisfactory choice. When quality is 
very good and the rejection of lots rare, a higher ratio may be of some 
advantage, but it may then be still better to stop sampling inspection 
altogether. 

We have now discussed the efficiency of double sampling plans in all 
in various aspects. In future papers we intend to deal with the choice 
of a sampling plan in practice and to provide a set of tables which 
enable a rapid choice of appropriate single and double sampling plans. 


APPENDIX 


FORMULAS FOR THE COMPUTATION OF ps9 AND h FOR DOUBLE SAMPLING PLANS 
We shall write 


e~™m* 


= (9) 


q(k; m) = 


R(c; m) = > q(k; m), 
0 


Q(c;m) = 1 —R(c;m) = DY qk; m). 


kec+l 





pOUBLE SAMPLING FOR ATTRIBUTES 


= g(k — 1; m) — q(k; m), 


room gtk; m) 
q(k; m) — 


= + 9(c;m) = Q(c — 1;m) — Q(c;m), 


, dQ(c; 
O(c; m) = SEim 


formulas which can be used also for k=0 or c =0 if we define 
q(—1; m) = 0, Q(-1; m) =1. 
The probability of acceptance for a double sampling plan with cz =cy is 


P = R(ei;m) + + q(k; m1) R(c2 — k; m2), 


kemey+l 
where 
m = N?P, Me = Nop, 


and p = the fraction defective in the lot. 
Furthermore we have 


R{ ce; (m: + m:)} = E alk m)R(c2 — k; ms) 


and making use of this relation (14) can be rewritten 


P = R(ci;m) + R(cz3 m + me) — Do glk; m)R (ce. — k; me), (16) 
k=O 
or introducing Q instead of R 
P = 1 — Q(z} m + m2) + } a q(k; mi)Q(cz — k; mz). (17) 
kad 


Since for a specific plan the sample sizes n; and nz are fixed we have for varying p 


dP dP 
aa a ae Gb <n De aes 
h 2p dp m™ (18) 
and 
dm: Ne 


19 
dm nN ( ) 


Differentiation of (17) yields 
‘ dP 
—-P = — —— = q(t; m)Q(c2 — c: — 1; m2) 


dm 1 
+ ( + “) [ aCe m, + mz) + Do glk; m)Q(c2 — kj; m2) (20) 
nm k= 


~ $ oh: Oe -— 8 ~ teed] 
k=O 


Since the functions q(k; m) and Q(c; m) have been tabulated in extenso by Mo- 
lina [5], formulas (17) and (20) are convenient for practical use. The computation 
of a sum of products from 0 to ¢; involves as a rule less labor than from (c:+1) to 
C2, particularly when c2=¢;. 
When c: +c; we have 
P = R(a;m) +  » 9k; ma) Rca — k; ms) 


met 


P =1 — Q(ce; m) — = q(k; mi)Q(cs — k; ma), 


=—cy+1 





848 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


and by differentiation 


, dP 
—P = — — = g(x; m) — q(c2; m)Q(cs — C2 — 1; ms) 
dm 


+ g(er; m)Q(cs — c1 — 1; me) 
+ (142 )[ E ateimrgces —  - 15 ms 


kewoy +1 


c2 
— alk; m)Q¢co — kj ms) |. (22) 

kee, +1 
It is now no longer possible to convert the summations but this is not necessary 
either, because when cz: +cs, c: and c; are lying fairly close together and a summa- 
tion of products from c; +1 to cs is brief and simple. 

The general policy was to compute P and —P for two values of m: which gave 
P values in the immediate neighborhood of P=0.50, and then to find the correct 
values by inverse interpolation. Table 3 gives full details of the numerical pro- 
cedures. 

TABLE 3 
ILLUSTRATING THE NUMERICAL COMPUTATION OF nipso 
AND h FOR THE SAMPLING PLAN D(2; 1, 4, 8) 








™m 2.9 
5.8 


3.0 
Ms 6.0 





a=1—Q(c2; m) 0.831777 0.815263 
b =q(c2; mi) 0.162154 0.168031 
c=q(c2; m1)Q(cs—c2 —1; me) 0.134594 0.142624 
d =q(c1; ™1)Q(cs —c: —1; ms) 0.057701 0.058803 


be | 
e= >> g(k; mi)Q(cs—k; mz) 0.311792 0.332528 
kee} +1 
be} 
f= Dd a(k; m)Q(cex—k —1; ma) 0.409015 0.426992 
kewcy +1 


P=a-e 0.519985 0.482735 
—P =b—c+d+3(f—e) = 0.376930 0.367602 














By inverse interpolation 





Mu =NPs0 2.954 
—Pso 0.3726 


h=—2 muPso 2.201 





Co 6.88 
NoPso 7.551 
n/N 0.392 








The value of mi =mipso was obtained by inverse linear interpolation: the OC in 
the region between P =0.52 and 0.48 is sufficiently straight to arrive at an ac- 
curate value that way. 





(22) 


eSsary 
mma- 


| gave 
orrect 
| pro- 


DOUBLE SAMPLING FOR ATTRIBUTES 849 


To obtain the derivative Ps. a somewhat more accurate method of interpola- 
tion was found necessary. For two values of mi, mio =2.9 and mz: =3.0, we have 
computed the corresponding values Po, P:, and Po, P:. We may then adjust a 
third degree equation so that it passes through the points mio, Po, and mis, Ps 
and has the correct slopes. From this equation we can next derive the slope at 
the point mn, P; =4. This — yields the interpolation formula 


Pw = (22 —"*) 60 + P,e(1 — 36) + Pe(1 — 30), 
m2: — Mio 


where 
mio 


e=—™,  e = (1-8). 
mz — M10 

This formula we have used. Originally a cruder method of computation was 
adopted but this led to slight discrepancies which could only be removed by using 
a more accurate technique. By using two copies of Molina’s tables so that the 
tables of individual and cumulative Poisson distributions can be laid side by side, 
the entire computation can be carried out straight from the tables. Routine com- 
putations of h for a particular double sampling plan do not take more than half 
an hour. It should be noted that 


Pu(c; m) = Q(c — 1; m) (23) 


when Py denotes the cumulative probabilities as tabulated by Molina. 
At any value of m the probability of a second sample is 


© spend oO + md — Oren: 


kee, +1 


hence the inverse efficiency is 
LE. = S[i+5 So = {Qe +1; m) — Qe; m)} | (25) 


while the corresponding lili of acceptance is provided by (17) or (21). 


REFERENCES 


[1] Dodge, Harold F., and Romig, Harry G., Sampling Inspection Tables, New 
York; John Wiley and ns, Inc. 1944: or Bell System Technical Journal 8 
(1928) 613-31. , 

[2] Hamaker, H. C., “Some notes on lot-by-lot inspection by attributes,” Re- 
view of the International Statistical Institute 18 (1950), 179-96. 

[3] Hamaker, H. C., “The theory of sampling inspection plans,” Philips Tech- 
nical Review, 11 (1950), 260-70. 

[4] Military Standard 105A: Sampling Procedures and Tables for Inspection by 
Attributes. U. 8S. Government Printing Office. The Standard is reproduced in 
full in “Quality Control and Industrial Statistics” by A. J. Duncan, Chicago; 
R. D. Irwin Inc., 1952. 

[5] Molina, Edward C., Poisson’s Exponential Binomial Limit, New York; Van 
Nostrand, 1947. 

[6] Statistical Research Group, Columbia University, Sampling Inspection, New 
York; McGraw-Hill Book Co., 1948. 

[7] Stein, Arthur, and Shaw, Lawrence W., “Some methods of reducing the 
amount of inspection in the application of double sampling inspection pro- 
cedures.” Ballistic Research Laboratory Report No. 248, Aberdeen Proving 
Ground, 1943. 





GENERALITY OF CONFIDENCE INTERVALS FOR A 
REGRESSION FUNCTION 


Epwin L. Crow* 
U. S. Naval Ordnance Test Station, China Lake 


N obtaining a confidence interval for the ordinate to a regression line 

(or surface, more generally) it is commonly assumed in statistics 
books, either explicitly or implicitly in the course of derivation, that 
the “independent” variable X is assigned fixed values, i.e., that in the 
hypothetical repeated sampling encompassed by the probability model 
the values of X must be the same as those in the sample actually ob- 
served (e.g., Cramér [3, pp. 548-54]. On the other hand it is fairly 
common to find applications of such confidence intervals in which the 
values of X evidently arose at random and would not be repeated in a 
further sample, except under hypothetical restriction; usually this 
restriction is not mentioned. Hald [6, pp. 522, 627] gives a unified treat- 
ment in which the independent variable is explicitly allowed to take 
either fixed or random values, but he does not demonstrate why results 
obtained in the case of fixed values (pp. 528-40) also apply in the case 
of random values (p. 616). It seems desirable here explicitly to bridge 
this gap, and it is easily done. The underlying idea is contained in 
Bartlett’s [2] consideration of conditional variation of a sample and 
in Fisher’s [5] consideration of samples having the same configuration 
(cf. [8]). 

The argument used is familiar in the theory of confidence intervals. 
Suppose, for example, that a random sample (1, 1, ---, Zn, Yn) is 
drawn from a bivariate normal population of (X, Y), and we calculate 
from the theory appropriate for fixed (1, - - - , 2,) a confidence interval 
with confidence coefficient y for the ordinate to the regression line at a 
prescribed value of X. It follows that in indefinitely repeated hypothet- 
ical sampling, in which each sample is taken from the normal popula- 

i Xn) 
=(%,-- +, 2»), the proportion of confidence intervals covering the 
true regression ordinate is y. But this is true whatever the z;; hence 
the probability that the confidence interval covers the true regression 
ordinate (without restriction on the z,) is 


Po Prten ee edd den = 7, 





* Now at the Boulder Laboratories of the National Bureau of Standards. 


850 





GENERALITY OF CONFIDENCE INTERVALS 851 


where f(m, - - - , Zn) represents the joint probability density function 
of the n random sample values of X. In other words, the confidence 
interval calculated for fixed values of X is a confidence interval with 
the same confidence coefficient without restriction on the values of X. 
The proof is seen to consist merely of averaging a constant, y, over 
the distribution of all possible samples of X of size n. 

The same reasoning can be stated more generally, so that it applies 
to a multiple regression ordinate, to a total or partial regression co- 
efficient [1, 3, pp. 550, 553], and to the difference between two regres- 
sion coefficients for two samples from possibly different multivariate 
normal populations. With respect to the last result it is interesting to 
note that Bartlett [1] specifically retained the requirement of fixed 
values of X in the two samples, although he obtained the more general 
result for regression coefficients in a single sample. In general (as sug- 
gested by Edward A. Fay), for any distribution F(z, y) of (generally 
vector-valued) random variables X and Y, if R(Y| x) is a confidence 
region for a parameter @ having confidence coefficient ~ with respect 
to each conditional distribution of Y given that X =z, then R( Y| X) 
(where X is a random variable) is a confidence region for 6 having the 
same confidence coefficient with respect to the joint distribution F. 

The analogous remarks can be made for tolerance [11] and predic- 
tion [9] intervals. 

It would appear that the introduction of additional random fluctua- 
tion, even though it be only through the values of X, should be re- 
flected somehow in the inference, whereas the above intervals are for- 
mally identical. Thus the variance of the regression coefficient in bi- 
variate normal regression is n/(n—3) times the corresponding variance 
with z values held fixed (as may be seen from [3, pp. 402, 549], where 
the sample variance > .(2;—2)?/n in the latter case is also equal to 
the population variance), but when the regression coefficient is “Stu- 
dentized” to obtain a ¢ statistic, the same result is obtained for the two 
cases. With the same confidence coefficient the confidence intervals 
are identical in the two cases, but the probabilities of covering some 
value in the parameter space other than the true value are different. 
Likewise, with the same significance level (probability of a Type I 
error) the tests of a hypothetical value of a regression coefficient which 
can be derived from the corresponding confidence intervals are formally 
identical, but the probabilities of a Type II error differ. 

The preceding statement can be illustrated by reference to the operat- 
ing characteristic of the ¢-test graphed for a significance level of 0.05 
by Ferris, Grubbs, and Weaver [4]. Suppose a sample of size n’ is drawn 





852 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


from a bivariate normal population (X, Y) for which the regression 
coefficient of Y on X is B, the standard deviation of zis o, and the stand- 
ard deviation of the Y deviations from the regression line of Y on X 
is o. It is desired to test the null hypothesis Ho: 8=0, while actually 
| | =}’c. The conditional probability of accepting H» for any particu- 
lar fixed set of x’s can be found from the referenced graph by taking 
n=n'—1, \=)’s,, where s.2= ),(z;—#)*/n. For example, if n’=5, 
\’=1.5, s2=1, then the graph yields the probability 0.47. The uncon- 
ditional probability of a Type II error can be obtained by numerically 
integrating the product of such a probability (read from the graph) 
and the probability density of s2=0¢,*x,2/n where x,° is chi-square 
with n=n’—1 degrees of freedom. The integral is a function of ¢,; if 
o:=1 in the above example, the result by Simpson’s Rule is 0.54, 
appreciably larger than the conditional probability 0.47 obtained with 
s,=1. If the median value 0.916 of s, is used for comparison, the con- 
ditional probability is 0.53, but the 5 and 95 per cent points of s, yield 
conditional probabilities of 0.15 and 0.84. Thus the probability of a 
Type II error in testing a regression coefficient in the bivariate normal 
case can hardly be calculated using the sample s, if the sample size is 
small. Patnaik [10] has given four numerical examples of the entire 
operating characteristics (actually the complementary power function 
curves) for both the fixed and random cases in the analogous problem 
of one-way analysis of variance. In every example the operating char- 
acteristic for the fixed case lies below the other, and Patnaik conjec- 
tured that this would be true in general. Johnson [7] confirmed Pat- 
naik’s conjecture for the usual range of significance levels (0.001 to 
0.10) but showed it unlikely to be universally true. 

Welch [12] has pointed out, with the aid of an example, that a test 
based on the distribution in samples of the same configuration, as 
above, may not be the best possible test, even though it is the best 
possible test within any one configuration. 


REFERENCES 


{1] Bartlett, M. S., “On the theory of statistical regression,” Proceedings of the 
Royal Society of Edinburgh, 53 (1933), 260-83. 

[2] Bartlett, M. S., “Properties of sufficiency and statistical tests,” Proceedings 
of the Royal Society, A, 160 (1937), 268-82. 

(3) Cramér, Harald, Mathematical Methods of Statistics. Princeton: Princeton 
University Press, 1946. 

[4] Ferris, Charles D., Grubbs, Frank E., and Weaver, Chalmers L., “Operating 
characteristics for the common statistical tests of significance,” Annals of 
Mathematical Statistics, 17 (1946), 178-97, in particular Fig. 6, p. 195. 





GENERALITY OF CONFIDENCE INTERVALS 853 


(5) Fisher, R. A., “Two new properties of mathematical likelihood,” Proceedings 
of the Royal Society, A, 144 (1934), 285-307. 

[6] Hald, A., Statistical Theory with Engineering Applications. New York: 
John Wiley & Sons, 1952. 

[7] Johnson, N. L., “Comparison of analysis of variance power functions in the 
parametric and random models,” Biometrika, 39 (1952), 427-29. 

[8] Kendall, Maurice G., The Advanced Theory of Statistics, Vol. II, London: 
Charles Griffin and Company Limited, 1946. Secs. 21.47, 22, 19-21. 

(9] Mood, Alexander McFarlane, Introduction to the Theory of Statistics. New 
York, Toronto, London: McGraw-Hill Book Company, Inc., 1950. Pp. 
297-9, 304-5. 

(10) Patnaik, P. B., “The non-central x?- and F-distributions and their applica- 
tions,” Biometrika, 36 (1949), 202-32, 226-27 in particular). 

(11] Wallis, W. Allen, “Tolerance intervals for linear regression,” in Proceedings 
of the Second Berkeley Symposium on Mathematical Statistics and Probability, 
ed. by J. Neyman, Berkeley and Los Angeles: University of California 
Press, 1951. Pp. 43-51. 

(12] Welch, B. L., “On confidence limits and sufficiency, with particular refer- 
ence to parameters of location,” Annals of Mathematical Statistics, 10 (1939), 
58-69. 





DISTRIBUTIONS OF SOLUTIONS OF A SET OF LINEAR 
EQUATIONS (WITH AN APPLICATION TO 
LINEAR PROGRAMMING)* 


M. M. BasBar 
University of Costa Rica 


HIs article presents an approach to deriving distributions of the 
g peers representing the solution of a set of simultaneous linear 
equations, when the coefficients are subject to random errors. In addi- 
tion, the distribution of a linear function of these variables is also dis- 
cussed. Specifically, the model considered is as follows: 


(B+ b)X = (Q+ 8), 


where B is an m Xm nonsingular matrix of known constants and b is 
an mXm matrix of random errors, Q is an m-column vector of known 
constants and ¢ is an m-column vector of the corresponding random 
errors. The problem is to derive the distribution of a general element of 
X and also the distribution of a linear function: 


(C + c)'X, 


where C is an m-column vector of known constants and ¢ an m-column 
vector of the corresponding random errors. 

A similar problem of errors in solutions of a set of linear equations 
when the coefficients are subject to error has been considered by many 
authors, for example, Lonseth [14], Hotelling [12], and Turing [18], 
but most of them have been concerned with “rounding” errors. Dwyer 
has an elaborate section in his book Linear Computations [7] devoted 
to the problem of the solution of linear equations with coefficients sub- 
ject to errors, but the errors again are not considered to be random and 
the treatment, therefore, is not based on the theory of probability. 
Recently, Box and Hunter [2], by an extension of the argument given 
by Fieller [8], have presented a formula for obtaining a confidence 
region for the solution of a set of linear equations where the errors in 
the coefficients have a multivariate normal distribution. The general 





* This article is based on the author's dissertation [1] submitted to the Graduate Faculty of Iowa 
State College in partial fulfillment of the requirements for the degree of Doctor of Philosophy. He is 
indebted to Gerhard Tintner for suggesting the problem and for guiding in its solution and to Earl 0. 
Heady for setting up the empirical example and helping in its analysis. Also, he acknowledges the im- 
provements due to the suggestions and criticism made by other members of the staff of the departments 
of economics, statistics, and mathematics of Iowa State College. 


854 


=e £©- 


- 
8 
ti 
t 
if 
8 





DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 855 


derivation does not assume the errors to be uncorrelated, and relatively 
easier formulas are derived when the errors are assumed to be uncorre- 
lated. If the equations are poorly “conditioned,” however, there may 
be serious difficulties in the empirical application of the results. 

In the following pages, the treatment of the problem is rather differ- 
ent. Actual approximate distributions of the variables which occur as 
a solution of the set of equations as well as the distribution of a linear 
function are derived. Also, direct and easily applicable formulas are 
presented to find the confidence limits of the variables and the linear 
function. Moreover, the probability that any variable or the linear 
function assumes a value greater than or less than any preassigned 
number can be investigated easily. 

The approach outlined in this paper, however, involves a high degree 
of approximation, though situations are mentioned in which it will be 
exact. The compensating aspect is that the analysis can be applied 
without much tedious computation or tabulation. 

Problems involving the solution of a set of linear equations occur fre- 
quently in many applied fields. In economics, for example, Leontief- 
type [13] input-output analysis depends on such a solution. In the 
general area of linear programming, Dorfman [6], making certain as- 
sumptions which guarantee non-degeneracy, has shown that the solu- 
tion which maximizes the linear objective function will involve activi- 


ties, active or disposal, equal in number to the linear restrictions. Thus, 
if the variables irrelevant to the planned program are removed, the 
solution can be written as 


BX = Q. 


When this plan is put in practice, however, the “operating” values 
of the input coefficients represented by the elements of B may be dif- 
ferent from their “anticipated” values used in planning. Thus, if ran- 
dom fluctuations in these coefficients are allowed, the operating model 
will be of the form 


(B+ b)X = (Q+ 6). 


Therefore the analysis of this paper can be usefully employed to 
predict the variation in the values of X and also to predict the variation 
in the linear objective function. In Section III a detailed example is 
discussed to illustrate this application. 

The treatment in Sections I and II is, however, quite general and it 
could be applied to experimental designs and problems of the physical 
sciences which involve the solution of linear equations. 





856 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


I. REDUCTION OF THE MODEL 
A. Consider a set of m linear equations in m variables. 


(1.1) (B+ b)X =(Q+¢) 


where B is anm Xmnon-singular matrix whose elements are known con- 
stants and Q is an m-column vector whose elements (q;) are also known. 
Small b represents an m Xm matrix of random errors whose elements 
(b;;) have known probability distributions, such that 


E(b.) = 0 oak ee 
and for 
E(b,;?) = o4;? j=1,2,---,m 


E denotes the mathematical expectation. 
Matrix (B+5) is also assumed to be non-singular. 

Since the elements of the matrix } are, later on, assumed to be nor- 
mally distributed on the real line, this assumption that (B+ 5) be non- 
singular can be achieved with probability one. 

Further, ¢ is an m-column vector of errors (¢;) whose elements have 
known probability distributions such that 


E(«) = 0 
and for +=1,2,--++, m 
E(e;?) = 7,7. 


Our aim is to derive approximate distributions for the variables 
Zi, Z2,***, 2m Which occur in the model as a solution of the linear 
equations (1.1). 

Let 

|B| denote the determinant of the matrix (B;;), 
|B+b| denote the determinant of the matrix (B;;+0;,), 
8;; denote the co-factor of the element B,; in | BI, 
| D*| denote the determinant of matrix (B,;) when its k-t col- 
umn is replaced by the column (q;), 
D,;* denote the co-factor of the element in the 7-** row and j- 
column of | D*|, 

| D¥+d*| denote the determinant of the matrix (B;;+5,;) when its 

k-*» column is replaced by column vector (q;+e;). We know 
that the solution of equations (1.1) is given by 


_ | D+a| 


1.2 ea | 
oad * TB+b| 


fork = 1,2,---+,m. 








DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 857 


Also, we will be interested in finding the distribution of a linear function 
of the solutions z;, 1=1, 2, - - -, m. For this purpose, let C be an m- 
column vector of fixed constants (Ci, C2, ---, Cm) and c be a column 
vector of errors (¢, C2, - + * , Cm), Where E(c;)=0,¢+=1, 2, +--+, m, and 
E(c,*) =w,?, i=1, 2, - - - , m. We consider the linear function 


y= D(C, + car 


(1.3) 


Baby (Crt oD +d ). 

B. Rule of procedure for an approximation. Now the determinants 
involved in expressions (1.2) and (1.3) will be reduced to approximate 
expressions using the following rule of procedure. 

We will ignore all cross products of errors of the second and higher 
orders in the expansion of a determinant whose elements involve errors. 

It may be noticed that although our rule of procedure does not as- 
sume the square and higher powers of the errors to be negligible, they 
do not occur in the expansion of the determinants anyway, since no 
element is multiplied by itself in the expansion of a determinant. 

C. Approximate formulas. Using the rule of procedure given above, 
we get expansions of the determinants involved in expressions (1.2) 
and (1.3) as follows: 


(1.4) | D'+d*| =| Dt*| + Dd Bues + D> Do Didi; = N (au) say. 
t=1 i=l j=l 
jmk 


Similarly, 

(1.5) |B+b| =| Bl + Xo Dd Bisby = D(z) say. 
tl jel ' 

Now, 
(1.6)  E(| D* +d*|) =| D*| = dsay, 
(1.7) V(| Dt + dt|) = Do (Bu)? + DD (DiA)*ou? = on’. 

i=l t=—1 — 

V denotes the variance. 

Again, 
(1.8) E(|B+b|) =! B| =8 say, 





858 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


and 


(1.9) V(j|B+b|)=& > b> (B,;)?0;;? = op? say. 


i=1 j=l 


Also 


(1.10) o(|D'+d|,|B+b6|) = >> Dd Di*6i0:;? = ome say. 
oe 
6 denotes the covariance. 
We are interested in finding the distribution of the quotient corre- 
sponding to (1.2) 


N (2x) 


(1.11) yy, = D(x) 


For the reduction of expression (1.3) we have, 


(12) HNG+ea({o+a|2za|p|4+3d 0.( © Bux) 


fl r=] r=l t=1 


m m 


+ DC,D0 Dd Dizi; 


r=l i=l j=l 
jmr 


+ >°| D'| c, = N(y) say, 


r=) 


E(N(y)) = >> C,| D’| = ay say. 


r=1 


V(N(y)) & > ( > Ci) + Deed YS (Dy) 


r=1 i=l r=1 i=l” j=l 
j#r 


(1.14) 


m 
+ >» |Dr |*co,? = ony)? Say. 


T=1 


Similarly |B+b| =D(x)=D(y) say (Ref. 1.5) where E(D(y)) =6 
and V(D(y)) =os? and 


(1.15) o@[N(y), D(y)] & DC. do Dy Dis Bisoi;? = op.wy say, 
oul 0 Gaul Jud 


jmr 


and the problem again is to find the distribution of the quotient. 





DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 
_ Ni) 
Dy) 


It is important to notice the situations in which the above formulas 
will be exact. 

First, if b=0 and c=0 and the elements of ¢ are uncorrelated, all the 
formulas given above will be exact. 

Second, if c=0 and if all the elements of b except those in one column, 
say the kt» column, are identically equal to zero and the interest is only 
a, then the formulas relevant to x; will be exact. 

In these two cases, if the errors are not uncorrelated but their co- 
variances are known, the formulas for the variances and covariances of 
the numerator and the denominator can be generalized to include the 
relevant covariance terms. In that case again we get an exact analysis, 
for the two cases mentioned above. 


(1.16) 


II. DISTRIBUTIONS 


Suppose that the probability distributions of the elements of b, ¢ 
and c are known. In Section I, it has been seen that we want the dis- 
tribution of a quotient of two linear functions of the errors involved. 
To find this, we could find the distributions of the numerator and de- 
nominator separately and then find the distribution of the quotient. In 
general, this could be done by writing down the joint distribution of 
the errors involved and making the necessary: transformations. Alter- 
natively, the same could be accomplished by employing characteristic 
functions and the inversion formulas presented by Gurland [11]. An- 
other way of doing the same thing may be by evaluating the moments 
of the distribution as suggested by K. Pearson [15] and C. C. Craig 
[4]. All of these procedures, however, seem difficult, particularly when 
the number of error terms involved is large and the distributions of the 
errors are non-normal. If the errors are normally distributed then one 
can deal with the problem of finding the distributions of expressions 
(1.11) and (1.16) as follows: 

A. Probability Distributions. It may be noticed that in both the ex- 
pressions the numerators N(z,) and N(y) and the denominators D(z) 
and D(y), are linear functions of the normally distributed errors. Such 
functions are themselves normally distributed. N(z;) will be normally 
distributed with expected value 6, and variance o,?. N(y) will be nor- 
mally distributed with expected value 6, and variance on,y)?. Also D(x) 
and D(y) will be normally distributed with expected value 6 and 
variance og”. In this way, the problem reduces itself to finding the dis- 





860 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


tribution of a quotient of two normal variates. We will use the follow- 
ing theorem proved by R. C. Geary [10]. 

Theorem. If N and D are normally distributed variables with E(N) 
and E(D) as the mean values, cy?, op? as variances and cy p as covari- 
ance, and Z=N/D is the quotient, then the expression, 

E(D)Z — E(N) 
a (on? — 2onnZ + op*Z?)'/? 
is 2pproximately normally distributed with mean zero and variance 
one, provided 


(2.2) E(D) > 3ep, 


(2.1) 





i.e., the coefficient of variation of Dis <1/3. 

In proving this theorem, Geary shows that if (2.2) is satisfied then 
the probability that Z lies between any two real numbers Z, and Zz is 
practically the same as the probability that a standard normal variate 
lies between é, and é, the values corresponding to Z; and Z: obtained 
from (2.1). In the particular case when E(D) =3¢p the difference be- 
tween the two probabilities for all values of Z, and Z: will be less than 
.0027. If E(D) is greater than 3p, this difference will be less than a 
number smaller than .0027. Assuming such difference to be negligible, 
Geary concludes the truth of the theorem. The importance of the con- 
dition (2.2) for the efficiency of the approximation is, therefore, ap- 
parent. 

Now, using this theorem we get the probability distribution of Z as 
follows: 


papaze—., LE @D)ewt— EWN )owo|+Z|E(N)en*— B(D)ens| 





V29 [on?—2onpZ+op?Z? }*!? 


XK eW V2 ZB (D)—B(N)}*Jon*—2endZ+0D"2") x dZ. 


(2.3) 


Identifying Z with x, we get the probability distribution of 2, 


1 [Bo.? = dope | + % [d.o? _ Bork | 
S(xx)dzr aS 
(2.4) J/2e (ox? — 2onnte + 05%? |*/? 


x e71/2 (2e8—de)*/on*—20 Be zetoB* 24") xX dx,. 





Again identifying z with y we get the corresponding probability 
distribution: 





DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 861 


1 [Bon yy? — dyoanwy] + y|dyon? ‘id Boni] 
2.5 dy = -—— 
~ sae V2a [on? — 2onnwy + ony? |*/ 


Ke V2[(ub-Oy)*/ow(y)*—2eBN y)vten'y*) Xx dy. 





B. Confidence limits. In practice, we may not be interested in the 
theoretical distributions as suggested in Section II A. In this section, 
we consider probability limits of the quotient Z=N/D obtained di- 
rectly without writing down the distributions. We have seen that 

E(D)Z — E(N) 


* (on? i 2onnZ 6 i op?Z?)'/? 





is approximately normally distributed with mean zero and variance 
unity. If we want limits with 100a per cent probability, where the 
probability coefficient a is a fraction, then consulting the standard 
normal tables, we find a positive number vy such that the probability 
of the absolute value of the standard normal variate being greater than 
vis (1—a). In other words 


26) P [ E(D)Z — E(N) < 7] 7 


(on? — 2onnZ — op?Z?)"? 





or 


P[{E(D)Z — E(N)}? S y*{on? — 2onn + on*Z?}] = a 


or 
P[{E(D)* — y*op*}Z*? — 2{E(D)-E(N) — y’onv}Z 
+ {E(N)? — y’on?} SO] =a 


P stands for a probability statement. 
Thus the roots of the quadratic equation in Z: 


{E(D)* — y*op*}Z* — 2{ E(D)-E(N) — v’own}Z 
— {E(N)? — y°on?} = 0 
will give the two numbers and Z lies between those limits with 100a 
per cent probability. 


Applying this general formula to find the probability limits of 2, 
and y of Section I, we see that the roots of the quadratic equation in Z: 


(2.8) {B? — y%on"}Z? — 2{ 0,8 — vonn}Z + {a2 — y°ox?} = 0 





862 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


will give the probability limits for solution 2, (for k=1, 2, - + -, m) 
and the roots of equation 


(2.9) {8? — y*on?}Z? — 2{0,8 — y*oava}Z + {dy — von w?} =0 


will give the probability limits for the linear function y. In both cases, 
the probability is 100 per cent. 

C. Cumulative distributions. If the frequency distribution of a con- 
tinuous variable is known, its cumulative distribution is obtained by 
integration from the lower limit of the range of variation to another 
variable. For instance, in Section II A it was indicated how to write 
down the distributions of x, and y, that is, the distributions of the ele- 
ments of X individually as well as that of a linear function of these 
elements. The variable Z=N/D where N and D are normal variates, 
can represent both functions. If f(Z) represents the frequency function 
of Z, then F(v), the corresponding cumulative distribution function, 
is obtained as follows: 


(2.10) F(v) = f sa@az 


In practice, however, it may be too tedious to evaluate the integral. 
Therefore, we suggest the following procedure, based on a paper by 
Fieller [9]. He has shown that the chance of obtaining a value of the 
variable Z= N/D not less than v, that is {1—F(v)}, can be computed as 
below: 


@.11) {1- Fo} = f : fo Neary + f ‘ _Nip)dedy 


where N(p) is bivariate (x and y) normal distribution with means 
zeros, both variances equal to unity and correlation coefficient p. The 
constants h, k and p are computed as follows: 

E(D) 
(2.12) h=- 


oD 
E(N) — vE(D) 


- [on? = 2onpv + op? |12 





(2.13) k 


and 


OnD — vop? 


op(on? sel 2onpv + op) V2 





p= 





DISTRIBUTIONS OF SOLUTIONS OF LINEAR ECUATIONS 863 


The values of the integrals in (2.11) can be obtained from K. Pear- 
son’s tables [16]. The tables are available only for positive values of 
hand k. However, the following relations regarding the bivariate nor- 
mal distribution with the means zero and the variances unity and cor- 
relation coefficient p, can be used in case either or both of h and k are 
negative. 


(2.15) f R J edie © J “N(O, Day — J c J "N(—p)dady 
(2.16) f "N(p)dzdy -f vo, 1)dz - fof NC odedy 


and 


Vo) ics 


f N(p)dzdy 
—h =k 


er oe | "N(, 1)dz — f-NO, Ddy + ff M(edsdy 


where N(p) refers to the bivariate normal distribution and N(0, 1) 
refers to normal distribution with mean zero and variance one. 

By the use of the tables and the formulas (2.11) to (2.17) we can 
get the probability that Z will not be less than any pre-assigned 
number ». 

To get the probability that Z will not be greater than v we compute 
{1—F(v) } by the above procedure and substract it from unity. 

It may be noticed that the approach suggested in this Section, in one 
way, is better than the approach discussed in Sections II A and II B. 
Geary’s approach gives approximate results, though the efficiency of 
the approximation is guaranteed by the condition (2.2). On the other 
hand, Fieller’s approach gives exact probabilities. In case (2.2) is satis- 
fied, the results obtained from both approaches will be approximately 
equal. If (2.2) is not satisfied, the procedure outlined in this section 
(II C) should be relied upon. However, sections II A and II B have 
their own advantages. Section II A provides approximate distributions 
of x, and y if needed, and Section II B provides quadratic equations 
which give probability limits directly without the help of K. Pearson’s 
tables. 


III. APPLICATION TO LINEAR PROGRAMMING 


A. In this section we will show an application of the above analysis. 
It has been said in the introductory remarks that a non-degenerate 





864 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


solution of a programming problem occurs as the solution of a set of 
linear equations, 


BX = Q. 


First of all, we will show it by presenting an empirical example 
worked out by the author [1]. An optimum production program was 
computed for a model family farm in Iowa which has the inputs shown 
in Table 1. Five crops—corn, oats, soybeans, flax and wheat—were 


TABLE 1 
FIXED RESOURCES 








Land, acres 148 
Capital, dollars 1800 
Labor—distributed over the year as below 





hours hours hours 
Jan. 182 May 182 Sept. 182 
Feb. 182 June 234 Oct. 182 
Mar. 182 July 234 Nov. 182 


Apr. 182 Aug. 234 Dec. 182 





considered. The relevant input-output data for the period 1928-1952 
were obtained for Hancock County (Ellsworth township) in Iowa and 
the input coefficients were computed over the same period. Their 
averages were used to derive the optimum program using the tech- 
niques of linear programming [3, 5]. The matrix of the input coef- 
ficients is shown in Table 2. Land, capital and the labor in the months 


TABLE 2 
MATRIX OF INPUT COEFFICIENTS 








Corn 


Oats 


Soybeans 


Flax 





Land 
Capital 
Labor 
(May) 
Labor 
(July) 
Labor 
(August) 


.02274 
.31772 


.05253 


.02555 


0 








.02770 
- 27870 


0 





-05862 
- 70812 


-11681 
-05485 


0 





-09249 
-96956 


0 











pISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 865 


of May, July and August were assumed to be limitational, though 
after deriving the program, labor requirements in other months were 
also checked. The objective was to maximize gross capital return and 
for this purpose average Iowa prices for 1952 were used. These are 
shown in Table 3. Thus if 1, 22, 2s, x4, and zs represent the levels of 


TABLE 3 
PRICES OF CROPS USED IN THE EXAMPLE 








Crop Price Per Bushel 





Corn 
Oats 
Soybeans 
Flax 
Wheat 





—_— 


crops (in bushels), which may occur in the program, the problem for 
solution was to find the amounts 2%, 22, 23, 24, and 2; of the corn, oats, 
soybeans, flax, and wheat respectively which maximize the linear 
function: 


(1.56)2; + (.84)r2 + (2.79)2s + (3.81)24 + (2.14) 25 
subject to the inequalities 


(i) (.02274)2,+(.02770) 22+ (.05862)2x3+ (.09249)24+ (.09081)25 S$ 148 
(.81772)2;+(.27870) 22+ (.70812)xz3+ (.96956) 2, 
+(1.00356)z; = 1800 
(.05253) 2; +022+ (.11681)73+02.+025 S 182 
(.02555) 21 + (.07523) 22+ (.05485) 23+ (.21186) 24+ (.42324) 25 S$ 234 
Oz, + (.08370)2z2+023+ (.30910) 24+ (.08650) 2; < 234, 


and 
(ii) X1, 2, Xz, Xs, and x; are 20. 


Introducing the disposal variables 26, 17, 2s, X9, 210 and using the 
simplex method [3, 5], the following program was derived. 





866 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 
a, = 3464.49718 
t, = 686.73761 
Xs = 5.70277 
X71 = 33.44223 
Mo = 21.73124 


Thus we should plan to produce 3464.5 bushels of corn and 687 bushels 
of flax, which will require the inputs shown in Table 4. The labor re. 
quired in this program during the rest of the months was also com- 
puted and it did not exceed the amourts available in those months, 


TABLE 4 
SCHEDULE OF INPUTS 








Input 


Unit 


Corn 


Flax 


Disposal 





Land 
Capital 


Labor 


Acres 
Dollars 


78.78 
1100.73 


63 .52 
665 .83 


5.70 
33.44 





May 
July 


Hours 
Hours 


0 
145.49 
212.27 


0 
0 
21.73 


182 
234 
234 

















Aug. Hours 





Now we notice that the quantities shown in (3.1) satisfy the follow- 
ing equations. 


(.02274)2, + (.09249)2, + (1)zs + (O)t7 + (0)2i0 = +148 
(.81772)a. + (.96956)a, + (0)z6 + (1)r7 + (0)210 = 1800 
(3.2) (.05253)a1 + (0), + (0)2e6 + (O)t7 + (O)ai0 = 182 
(.02555)a1 + (.21186)24 + (0)% + (0)27 + (0) 210 234 
(O)a: + (.30910)ar4 + (0)a— + (0)a7 + (1) 210 234 
which corresponds to the set 
(3.3) BX = Q. 
The expected gross profit and corn flax yields are as follows. 


3464.5 bushels 
687 bushels 
$8 ,021 


Corn (2:1) 
Flax (24) 
Gross profit (y) 





DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 867 


The purpose of this section is to show a simple application of Section 
Il. Therefore, a detailed discussion of assumptions and economic 
implications of the use of linear programming in farm production 
planing is avoided here. Interested readers may refer to [1]. 

B. Statistical Analysis. The input coefficients (the non-negative 
coefficients of 2; and 22 in equations 3.2) are the averages of the cor- 
responding empirical values computed over the past twenty-five 
years. The unbiased estimates of the variances computed from the 
same series were used to apply the analysis of Section II with the as- 
sumption of normality of the distributions of these coefficients. The 
fixed inputs (Q) and the prices (C) were assumed to be without random 
errors. It can be verified that in this example, we will need only three 
variances to be used in formulas (1.7), (1.9), (1.10), (1.14) and (1.15). 
These variances are as follows: 


1) Variance of the labor input coefficient for May for corn which is 
000227125. 

2) Variance of the labor input coefficient for July for flax which is 
.003262968, and 

3) Variance of the labor input coefficient for July for corn which is 
.000053646. 


With the help of these estimated variances and by the direct applica- 
tion of the quadratic equations (2.8) and (2.9), the following 95 per 
cent probability limits were obtained. 








Lower Limit Upper Limit 





Corn (2) bushels 2,088 10,282 
Flax (24) bushels 46.5 1,625 
Gross profit (y) $6 ,298 $18 ,866 





Comparison of the lower limit of the gross profit with its expected 
amount, $8021, gives an increased confidence in the program. 

Next the following probabilities were investigated along the lines 
suggested in Section II C. 
A. Corn yield 


P[x: < 2000] = .0277 


P[x, = 10000] = .02077 


B. Flax yield 
P[zs< 50] = .027 


P[x, = 1600] = .0228 





868 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


C. Gross profit 


Ply < 6500] = .0276 
P[y = 18000] = .0232 


This means that the probability is only .0276 that the profit will be 
less than $6500. The usefulness of this result is obvious. 

Thus we notice that the analysis of Section II can be applied in 
practical problems with the advantage of easy computation. It is 
based on the approximate reduction of Section I, which is not objec- 
tionable if the random fluctuations in the constants involved in the 
model (the input coefficients in the case of linear programming) are 
sufficiently small. 

We believe that the application of this probability approach to the 
problems of linear programming and inter-industry analysis gives a 
greater flexibility to the handling of the input coefficients. 

Instead of the assumption that these input-output relations are fixed 
constants, not subject to any variation during the lapse of period 
between planning and the action, we assume that the fluctuations are | 
possible but are random variables and normally distributed. Thus in- 
stead of estimating the fixed input coefficients, we may estimate their 
distributions. That is, their expected values and the variances. This 
may be done by using technical information or the relevant past data. 
The planning of the program is effected by the expected values of the 
input coefficients and the variances are used to form an idea of the 
variability of the outcome of the program. 

There is another situation, wherein this probability approach may 
be very helpful. Sometimes, we may have more than one optimum 
solution of a linear programming problem, which give the same value 
of the “objective function.” The highest lower limit of the “objective 
function” for the same probability will then provide a criterion of 
choice among those alternative programs. 

It is admitted that the approximation suggested in Section I is 
rather crude. We could have retained higher order product terms of the 
random errors. The derivation of the distributions in that case would 
become highly complicated. Even if the errors are normally distributed, 
their product is not normally distributed. For simplicity of presenta- 
tion of this approach and for readily applicable results, we have con- 
fined ourselves to first order approximations only. As has been seen 
in the preceding example, however, and in many other practical cases, 
the values of the coefficients may be less than one—see eq. (3.2). As 
matter of fact, the unit for the variables (z,) may be so defined that 
the values of the coefficients which represent inputs required per unit 








DISTRIBUTIONS OF SOLUTIONS OF LINEAR EQUATIONS 869 


of a particular output will be less than one. Therefore, the possible 
random errors in those coefficients will be still far smaller for practical 
reasons. For example, if the mean land input coefficient is .02274, the 
random fluctuations in it can not make it negative. Hence the random 
error will be far less than .02274. Neglecting their second and third 
order products will not introduce much serious bias in the results. 


REFERENCES 


[i] Babbar, M. M., Statistical Approach in Planning Production Programs for 
Interdependent Activities, Doctoral Dissertation, Ames, Iowa State College 
Library (1953). 

[2] Box, G. E. P., and Hunter, J. S., “A confidence region for the solution of a 
set of simultaneous equations with an application to experimental designs,” 
Biometrika, 41 (1953), 190-9. 

[3] Charnes, A., Cooper, W. W., and Henderson, A., An Introduction to Linear 
Programming, New York: John Wiley and Sons, Inc. (1953). 

[4] Craig, C. C., “The frequency function of y/x,” Annals of Mathematics, 
30 (1929), 471-86. 

[5] Dantzig, George B., “Maximization of linear function of variables subject 
to linear inequalities,” in Activity Analysis of Production and Allocation, 
Edited by Tjalling C. Koopmans, New York: John Wiley and Sons, Inc. 
(1951). 

[6] Dorfman, Robert, Application to Linear Programming to the Theory of the 
Firm, Berkeley: University of California (1951). 

[7] Dwyer, Paul S., Linear Computations, New York: John Wiley and Sons, 
Inc. (1951). 

[8] Fieller, E. C., “The biological standardization of insulin,” Journal of Royal 
Statistical Society, Supplement, 7 (1940), 1-53. 

{9} Fieller, E. C., “The distribution of the index in a normal bivariate popula- 
tion,” Biometrika, 24 (1932), 428-40. 

{10} Geary, R. C., “The frequency distribution of the quotient of two normal 
Variates,” Journal of the Royal Statistical Society, 93 (1930), 442-6. 

(11) Gurland, John, “Inversion formulae,” Annals of Mathematical Statistics, 
19 (1948), 228-37. 

(12] Hotelling, Harold, “Some new methods in matrix calculations,” Annals of 
Mathematical Statistics, 14 (1943), 1-33. 

[13] Leontief, W. W., The Structure of the American Economy, New York: Oxford 
University Press (1951). 

[14] Lonseth, A. T., “Systems of linear equations with coefficients subject to 
error,” Annals of Mathemetical Statistics, 13 (1942), 332-7. 

[15] Pearson, Karl, “On the constants of index distributions, as deduced from 
the like constants for the components of the ratio, with special reference 
to the opsonic index,” Biometrika, 7 (1910), 531-41. 

[16] Pearson, Karl, Tables for Statisticians and Biometricians, Part II, Cambridge 
University Press (1924) 

[17] Tintner, Gerhard, “The distribution of solutions of Inear equations whose 
coefficients are subject to error.” Unpublished, Department of Economics 
and Sociology, Iowa State College. 

(18) Turing, A. M., “Rounding off errors in matrix processes,” Quarterly Journal 
of Mechanics and Applied Mathematics, 1 (1947), 287-308. 





ESTIMATION OF PARAMETERS FROM 
INCOMPLETE DATA 


Freperic M. Lorp 
Educational Testing Service 


1, INTRODUCTION AND SUMMARY 


HE present note is concerned with a special case of the general 

problem of obtaining efficient estimators for the parameters of a 
normal multivariate population when the available sample data are 
incomplete in the sense that measures on all variables are not available 
for all individuals in the sample. Such fragmentary data may arise 
because part of the data are irretrievably lost (e.g., in an archaeological 
find), or because certain data were purposely not collected. The de- 
cision not to measure all individuals in the sample on all variables 
may be reached because of the cost of measurement, because of limited 
time, because the measurement of a certain variable alters or destroys 
the individual measured (e.g., in mental testing, testing explosives), 
and so forth. 

The general problem for normal bivariate populations has been 
treated by Wilks [3]. This treatment has been further generalized to 
normal multivariate populations by Matthai [2], who deals explicitly 
with the trivariate case. Unfortunately, the general maximum likelihood 
equations have proved rather intractable, and no simple formulas for 
the maximum likelihood estimators are available in the general case, 
even for a sample from a bivariate population. 

In the present paper, the problem of estimating the parameters of 
a normal trivariate population from incomplete data is dealt with in a 
special case for which explicit solutions to the maximum likelihood 
equations are readily obtained. This special case is described in the 
following section. Formulas for the maximum likelihood estimators 
are given; their application is illustrated by a numerical example. The 
sampling variances and covariances 3f the maximum likelihood estima- 
tors are derived. An examination is made of the efficiency of the usual 
methods that utilize only that portion of the data that is complete. 

The foregoing results are specialized to apply to a commonly en- 
countered bivariate (rather than trivariate) situation. 


2. PROBLEM 


We will be concerned with three variables, u, v, and w, which are 
assumed to have a normal trivariate distribution in the population 


870 








ESTIMATION OF PARAMETERS 871 


from which the available random sample of individuals has been drawn. 
In the available data, variable w is recorded for all individuals; either 
u or v is recorded for all individuals, but not both. The N’ individuals 
for whom u is recorded will be denoted collectively as group a; the N’’ 
individuals with v will be denoted as group b. The total number of indi- 
viduals in the sample is N=N’+N”’. 

If u and »v are correlated with w, it is obvious after a little thought 
that the data for group b contain some information relevant for esti- 
mating the parameters of variable u, and that the data for group a 
contain some information relevant for estimating the parameters of v. 
The problem is to use the available data as efficiently as possible for 
estimating the parameters concerned. 


3. THE MAXIMUM LIKELIHOOD ESTIMATORS 


The joint distribution of u, v, and w involves nine parameters, which 
may be defined in various ways. A commonly used set of nine parame- 
ters is the means (pu, Ms, Mw), the variances (¢,?, ¢,”, o,,”), and the cor- 
relations (puw, Pew, Pur). The last correlation coefficient (p,,) will not be 
discussed further, since it cannot be estimated from the data at hand. 
(Although upper and lower bounds to p,, may be set by making use of 
the fact that the correlation matrix must be positive definite, these 
bounds are ordinarily very far apart.) For convenience, the subscript 
w will be dropped from the two remaining correlation coefficients. 

The maximum likelihood estimators for the means are (see section 5): 


Ino = %, (1) 


ju = @' — B,(w’ — @), (2) 


iy = 0" — B(w" — w). (3) 


Here the estimators are denoted by “stars” (*); a, 4, and # are sample 
means; sample statistics relating to groups a and b, respectively, are 
distinguished from statistics relating to the combined groups by single 


and double primes; and Bu and Bo are estimators defined in equations 


9and 10. It is readily shown that jt», #u, and, are unbiased estimators. 

Instead of estimating the five parameters ow, ou, ov, Pu, and p, 
directly, it is convenient to replace these by an equivalent set consisting 
of o., the standard errors of estimate ou.» and o»-., and the regression 
coefficients 6, and 8,, where 





872 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 
2 


2 2 
Cuw = ou(1 = Pu), \ 
(4) 


Sw = oo(1 — pr); 
Bu = Cupu/ Cw; 
B, = OrP»/Cw. } ©) 


The maximum likelihood estimator for any parameter in this new set 
is simply the corresponding sample statistic: 


1 N 
bd te = Lv — wt 


*2 72 2 2 
Cuw = Suw = 8y(1 im Tue); 


*2 77] 2 - 
Cv-w = Sv-w 8» (1 = Tow , 


Bu = Din = Suloro/Siy 


Be = Dim = Sitvw/8o, (10) 
where s, r, and b denote the usual sample standard deviation, correla- 
tion coefficient, and regression coefficient. 

Maximum likelihood estimators for ou, ov, pu, and p, are readily 
obtained by solving equations 4 and 5 for these parameters and replac- 
ing all parameters by their estimators. Thus, 

#2 
ou = Gu + Budws (11) 


*2 42 
*2 B uw 


fi = ’ (12) 


* #.,% 
2 2 
Cou t The 





and similarly for v. It should be noted that maximum likelihood esti- 
mators for ou, o», pu, and p, are not provided by the corresponding 
sample statistics. 


4. NUMERICAL EXAMPLE 


Table 1 gives some data* used for equating two psychological tests, 
u and v. Tests u and w were administered to group a; tests v and w, 
to group b. (Tests u and v could not both be administered to the same 
group because of limited testing time; test w was a very short test 
that avoided this difficulty.) The practical problem here is to estimate 


* The writer is indebted to W. H. Angoff for these data. 





~~ Ss coe Soe = of Oo hUcOlUCr DCO 1a tt -<_J32ttw 


_32 hur) 





ESTIMATION OF PARAMETERS 873 


Jv, He, Fu, and oy. (Given these estimates, it is then possible [1] to 
convert @ score on test v to an “equivalent” score on test u, or vice 
versa; however, this further step does not concern us here.) 

From Table 1 and equations (7-10), it is found that 6,=2.309, 


§.=2.060, o3.=68.92, o2=58.73. From equations (2) and (3), 


the required estimated means are found to be hy =37.72 and le = 34.13; 
from equation (11), it follows that the required estimated variances 
are o2=162.11 and o2 = 132.91. These results complete the solution 
insofar as the problem is one of estimation. 

A heuristic statistical interpretation of these computations is the 
following. Consider the problem of estimating y,. It is observed from 


TABLE 1 
DATA FOR EQUATING TESTS u AND vo 








Combined 


Group b (N”’ =511) Grouse 


Group a (N’ =506) 





Test u 


Test w 


Test v 


Test w 


Test w 





Q@’= 38.04 


w’ =17.33 


v= 33.84 


w”’ =17.05 


w =17.19 





Variance 


a2 =164.11 





a2 =17.86 


4,7 =131.16 





8,2 =17.06 


82, =17.48 














Correlation with w Tow = .7616 Toe ™ 7431 





the data that group a has a larger sample mean in the w variable than 
group b, i.e., #’>w’’. Since both groups are random samples from the 
same population, and since w is (apparently) positively correlated with 
u, it is plausible that the sample mean of u in group a is larger than the 
(unavailable) sample mean of u in group }, i.e., @’>a’’, and hence 
i’>a, where @=(N’a’+N’'a'’)/N. Since this last statistic, if it were 
available, would be the efficient estimate of uu, the estimate a’ should 
be adjusted downward. The size of the adjustment required is given 
by formula (2). Similarly, for the estimation of u,. Analogous heuristic 
remarks apply to the estimation of o2 and o2. The fact that (2) 
and (3) provide unbiased estimators is evidence that the adjustments 
made in estimating yu, and yz, are of the correct size, even in small sam- 
ples. 

If the statistician did not avail himself of the relevant information 
in group b, he would simply use the observed mean, @’, as an estimate 
of wy. The efficiency of this simple estimating procedure in the case 
of the present numerical example is seen from (17) to be only .71. 
The efficiency of s? as an estimate of o? is seen from (19) to be .83. 





874 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 
5. THE LIKELIHOOD FUNCTION 


The detailed derivation of the formulas for the maximum likelihood 
estimators will be given elsewhere [1] as part of a discussion of a par- 
ticular application. These formulas are derived by the usual methods 
from the likelihood function, which is constructed as follows. 

The likelihood function of u and w for group a is clearly 


i ws : exp |- . ype 


as (Qroutway)™’ 2a2 o 





+—— - 
Ce Cy ow 





(Wa — pw)? (Ue — Hu) (We — Hw) | (13) 


where aj=1—p2. A similar expression in v and w holds for group b. 
The likelihood function for the entire set of data is simply the product 
of these two separate functions. 


6. SAMPLING VARIANCES; THE EFFICIENCY OF OTHER ESTIMATORS 


Table 2 gives the matrix of the large-sample variances and co- 
variances of the maximum-likelihood estimators for the population 
means. The estimators for Bu, Bo, ou-w, Fo-w, aNd oy are uncorrelated in 


TABLE 2 


MATRIX OF THE SAMPLING VARIANCES AND COVARIANCES 
OF THE MAXIMUM LIKELIHOOD ESTIMATORS OF THE 
POPULATION MEANS 








+ 
Mu 





‘ (1+ spe! : 
—_— “6 ~~ TuFyPu 
N a) | atte 





N 





2 
1 1 Go N’ 2 
or Ht ae) 














ESTIMATION OF PARAMETERS 875 


large samples; the large-sample variances of these estimators are as 
follows: 


Var 3 (14) 
Var 2.0 (15) 


Var By (16) 


and so forth. 

Various investigations can be made regarding the amount of addi- 
tional information obtained by making full use of the fragmentary 
data available. For example, if u, is estimated by w’, the sampling 
variance of the estimate is, of course, o,/N’. The efficiency of this 
estimate in the present situation is, of course, the reciprocal of the 
ratio of its sampling variance to the sampling variance in the upper 
left cell of Table 2, viz., 

N” 


1- V pe. (17) 


It is apparent that if p, is large and the proportion of cases in the b- 
group is large, then utilization of the data on w in the b-group will 
improve the estimation of pu, very considerably. 

Equations (7) and (9) show that estimates of B, and oz. are not 
improved by using information from the b-group data. Estimates of 
o, and of p, will be improved, however. For example, the sampling 
variance of ¢2 in large samples is found from (11), (12), (14), (15), (16) 
to be 


20% ( N"” 


Var o2 = 1—— gi}. (18) 


N’ N 


The efficiency of s”? as an estimate of 0% is seen to be 


” 
1— . 
N p 


Other formulas may be similarly obtained for other parameters. 





876 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1935 


7. THE BIVARIATE CASE 


It is seen that the observed values of variable v do not enter into (1), 
(2), (6), (7), (9). Consequently, the formulas derived here for u and w 
apply without change to the bivariate case of incomplete data where 
one variable is observed for all individuals in the sample but the second 
variable is not. In other words, if complete data on w are available for 
N’+N” cases whereas data on u are available on only N’ of these 
cases, then the maximum-likelihood estimates of uu, uw, and B, are 
given by equations (2), (7) ,and (9), as before. 


REFERENCES 


[1] Lord, F., “Equating test scores—a maximum likelihood solution,” Psycho- 
metrika, in press. 

[2] Matthai, A., “Estimation of parameters from incomplete data with applica- 
tion to design of sample surveys,” Sankhyd, 11 (1951), 145-52. 

[3] Wilks, S., “Moments and distributions of estimates of population param- 
eters from fragmentary samples,” Annals of Mathematical Statistics, 3 
(1932), 163-203. 





TRUNCATED BINOMIAL AND NEGATIVE 
BINOMIAL DISTRIBUTIONS 


Paut R. Riper 
Wright-Patterson Air Force Base 


INTRODUCTION 


ONSIDERABLE interest has recently been manifested in truncated 

distributions. (See references at end of paper.) Finney [10] has 
treated the truncated binomial distribution and has shown how to 
estimate the parameter of the distribution by an iterative process 
which requires special tables. He mentions several practical problems 
in which a truncated binomial distribution might be met. The first part 
of the present paper shows how to estimate the parameter by a simple 
method analogous to that previously used [16] in estimating the 
parameter of a truncated Poisson distribution. The second part of the 
paper uses the same method to develop formulas for estimating the 
parameters of a truncated negative binomial distribution. 


THE POSITIVE BINOMIAL DISTRIBUTION 
If the probability of happening of an event in a single trial is p, the 
probability that it will happen exactly z times in 7 trials is 


n! 
PP: = 


= ———— p(1—p)**; 2=0,1,---,n. 
BG +9 b8---s9 (1) 


The expression (1) is called the binomial distribution function, since it 
is the general term in the binomial expansion of (q+p)*, in which 
q=1—p. 

A common problem is that of estimating the probability p from a set 
of data obtained by observing the results of an experiment. The prob- 
lem with which we are concerned here is that of estimating p from a 
sample taken from a truncated distribution. 

Let us designate by f, the frequency with which the value x occurs 
in the sample. Then the expected value of f. is Np., N being the un- 
known total of the untruncated sample. Suppose that k classes are 
truncated from the lower end of the distribution.’ We shall use the 
following notation, in which z is the index of summation: 





1 If the truncation occurs at the upper end of the distribution the roles of p and gare simply inter- 


877 





878 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


n n 


T= > fe T, = > xf, T: = Zz xf, (2) 
k 


k k 


k—-1 
TT.’ =N> pe + To, 
Q 


k-1 
T;' = ND 2p.+ Ti, 
0 


k—1 
Te’ = ND xp, + To. 
0 


Then T,'/To’ is an estimate of the first moment of the distribution, 
namely np, and T,’/T;’ is an estimate of the second moment about the 
origin, np(q+mnp), divided by the first moment, this quotient being 
q+np. We therefore set T;’=npTo’, To’ =(q+np)T1’, that is, 


k—1 n! k—1 n! 
ND ——— ape * + T = Nap), ———_ p"q** + npT>, (6) 
0 «l(n — x)! 0 «i(n — zx)! 
n! 
NX tpg + Tr 
0 «i(n — 2)! 
n! 


=N@+ np) -ongege pg’ * + (q+ np)Ti. (7) 
Then we solve (6) for g*~*t!, obtaining 
(k — 1)(7, — npT») 
Nn(n — 1)--- (n—k+1)p 


grt! = (8) 





Next we substitute this value in (7) and solve for p. The resulting esti- 


mate is 
T. — kT; 


~ (n—1)T, — b—1)nTe 





p 


The estimate of N can be obtained from the equation 
N wit To = n> ————_ pq". 
0 6 xl(n — 2)! 
THE NEGATIVE BINOMIAL DISTRIBUTION 
The negative binomial distribution function is [12] 
(m+ 2-1)! p* 
ai(m—1)! (1+ p)™* 








gINOMIAL DISTRIBUTIONS 879 


This is the general term in the binomial expansion of (r—p)-*, in which 
r=1+p. As we wish to estimate p and m, also the number N in a sam- 
ple before truncation, we need to use three moments. The first three 
moments about the origin are 


wy’ = mp, — wa’ = mp(1 + p+ mp), 
us’ = mp(1 + 3p + 2p? + 3mp + 3mp? + mp’). (12) 


We shall consider only the case in which the class corresponding to 
1=0 has been truncated; formulas for the general case are too compli- 
cated to be of interest. In the notation of formulas (2)—(5) with k=1, 
we have, since po= (1+ 7p), 


To’ =NA +p) "+7; Ti = Ti, 1> 0. (13) 

Now T2’/T;’ is an estimate of pe’ /u1’, and consequently we set 
T. = T,1+ p+ mp), (14) 
Ts = Ti(1 + 3p + 2p? + 3mp + 3mp? + mp’). (15) 


Next we solve (13) for mp and substitute in (14). 
Solving the resulting equation, we find the estimates 


= T3:T; — 7:7, + T? — T?? 
T(T. — T,) 





p 


From (14) and (17) we obtain the estimate 


2T?? = T2T; = T3T\ 
m= . 
T3T 7 TT; + T;? — T2? 





An estimate of N can now be obtained from the equation 


N — T) = Nr. (19) 


COMPARISON WITH MAXIMUM LIKELIHOOD ESTIMATES 


In this section the proposed estimators are compared with maximum 
likelihood estimators. 

For the truncated binomial distribution the maximum likelihood 
estimate of p is given by the solution of the equation 





880 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 


k—-1 nix 
9 - 2. 90 - 
0 «l(n— 2x)! 
py) — rsa nl PJ (20) 
1- 5 ——— (1 - a) 


9 ti(n — =z)! 





in which the left side is the first moment of the sample from the trun. 
cated distribution, k being the number of classes truncated. (Cf[16],) 
Special cases of (20) are 
np 

i~- G— oF 

» _ _™p — np(l — p)* 
i~G-g?-ge-y™ 

For the purpose of comparison we use Table 1, taken from [II, 
sec. 18]. This table exhibits data concerning 53,680 families having 
eight children each. The number of boys in each such family is denoted 
by x, the corresponding frequency by f. The mean number of boys is 
4.117418; the value of p is one eighth of this, or 0.514677. 


Using the estimator proposed (as if the frequencies of certain classes 
were unknown), we find: 





(21) 


vy" 





(22) 


V3 


For k=1; T)>=53680 —215 =53465, 7; =221023, T,= 1021023, and, 
from (9), p=0.517. The estimate of N, obtained by using (10) is 
53624. 


TABLE 1 
NUMBERS OF BOYS IN FAMILIES HAVING EIGHT CHILDREN 











Number Number of 
of Boys Families af xf 
z f 


215 0 
1485 1485 
5331 21324 

10649 95841 
14959 239344 





11929 298225 
6678 240408 
2092 102508 

342 21888 





53680 1021023 

















BINOMIAL DISTRIBUTIONS 


X 


TABLE 2 
VALUES OF »/’ 











1.107 
1.230 
1.370 
1.531 
1.714 


tr oo 
sas 


8 
oo 
om ot 


3822 88822 


a ae a 
ss88 S8F3s 


8 


1.923 
2.158 
2.419 
2.703 





























oo o b> bv 
238 

Conon 
COND 





For k=2; TT ,=53680—215—1485=51980, 7;,=221023—1485 
=219538, T2.=1021023—1485=1019538 and, from (9), p=0.518. 
From (10), our estimate of N is 53475. 

In obtaining a maximum likelihood estimate of p from a binomial 
distribution from which one class has been truncated (k=1), a table 
such as Table 2 is convenient. In the present example »,’ = 221023/53465 
=4.1340. In the column for n=8 we interpolate between 4.016 and 
4.803 and find p=0.515, a correct estimate. 

When k=2 the situation is much more complicated. Equation (22) 
must be solved. In the illustrative example, »’’=219538/51980 
= 4.223509, and (22) becomes 


8p — 8p(1 — p)? 
1 — (1 — p)* — &p(1 — p)’ 


It is convenient to replace p by 1—qg. Making this substitution and re- 
ducing, we are led to the equation 


g® — 1.195609q? + 0.370903q — 0.175089 = 0. 


Now this equation, although of eighth degree, is not too difficult to 
solve, since so many powers of the unknown are missing. It has the 
root q=0.485, giving p=0.515, again a correct estimate. However, 
the work entailed in obtaining it, as compared with that required to 
obtain the estimate by the simple method suggested in this paper, 
seems hardly to justify the slight gain in accuracy. (It is recalled that 
the estimate referred to was 0.518.) 

For illustrating the estimators of the parameters of a truncated 
negative binomial distribution we use Table 3, taken from [2, p. 186]. 


4.223509 = 








882 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


TABLE 3 


NUMBERS (z) OF YEAST CELLS PER SQUARE 
IN A HEMOCYTOMETER 








af 27 


0 0 





74 
54 
12 

5 




















In this table, xis the number of yeast cells per square in a hemocytome- 
ter, f is the corresponding observed frequency in a certain experiment. 
As well as can be ascertained, the estimates used by the authors are 
p=0.192276, m=3.549593; they purport to have been determined by 
the method of maximum likelihood. 

If the frequency corresponding to x=0 were unknown, then in the 
notation given earlier we should have T7)>=400—213=187, 7, =273, 
T:=511, T3=1227. The estimate of r, as given by (17), is 


_ (1227)(273) — (511)? 
~-:273(511 — 273) 





= 1.136608, 


from which we get p=0.136608. The estimate of m, as given by (18), is 


2611)? — (611)(278) — (1227) (273) 
™ * (1227) (273) — (511)(273) + (273)? — (511)? 





= 5.381703. 


From (19) we obtain the estimate N =376. These estimates are reason- 
ably good, especially when it is taken into consideration that over half 
of the observations are in the zero class. 

Maximum likelihood equations for estimating the parameters of an 
untruncated negative binomial distribution are given in [2]. Since 
they are extremely difficult to solve and since the corresponding equa- 
tions for a truncated distribution would be worse, no investigation of 
such equations has been made in this study. 





BINOMIAL DISTRIBUTIONS 883 


REFERENCES 


[1] Bliss, C. L., “Estimation of the mean and its error from incomplete Poisson 
distributions,” Connecticut Agricultural Experiment Station Bulletin, 513 
(1948), 12 pp. 

[2] Bliss, C. I., and Fisher, R. A., “Fitting the negative binomial distribution to 
biological data and note on the efficient fitting of the negative binomial,” 
Biometrics, 9 (1953), 176-200. 

[3] Cochran, W. G., “Use of IBM equipment in an investigation of the trun- 
cated normal problem,” Proceedings of Research Forum, International Busi- 
ness Machines Corp., 1946, 40-3. 

[4] Cohen, A. C., Jr., “On estimating the mean and standard deviation of 
truncated normal distributions,” Journal of the American Statistical Associa- 
tion, 44 (1949), 518-25. 

[5] Cohen, A. C., Jr., “Estimating parameters of Pearson type III populations 
from truncated samples,” Journal of the American Statistical Association, 
45 (1950), 411-23. 

[6] Cohen, A. C., Jr., “Estimating the mean and variance of normal populations 
from singly truncated and doubly truncated samples,” Annals of Mathe- 
matical Statistics, 21 (1950), 557-69. 

[7] Cohen, A. C., Jr., “Estimation of parameters in truncated Pearson fre- 
quency distributions,” Annals of Mathematical Statistics, 22 (1951), 255-65. 

[8] Cohen, A. C., Jr., “Estimation of the Poisson parameter from truncated 
samples and from censored samples,” Journal of the American Statistical 
Aseociation, 49 (1954), 158-68. 

[9] David, F. N., and Johnson, N. L., “The truncated Poisson,” Biometrics, 8 
(1952), 275-85. 

{10} Finney, D. J., “The truncated binomial distribution,” Annals of Eugenics, 
14 (1949), 319-28. 

[11] Fisher, R. A., Statistical Methods for Research Workers, London: Oliver and 
Boyd, 11th Edition, 1940. 

[12] Fisher, R. A., “The negative binomial distribution,” Annals of Eugenics, 11, 
part 2 (1941), 182-7. This paper is reprinted as paper 38 in Fisher, R. A., 
Contributions to Mathematical Statistics, New York: John Wiley and Sons, 
Inc., 1950. 

[13] Hald, A., “Maximum likelihood estimation of the parameters of a normal 
distribution which is truncated at a known point,” Skandinavisk Aktuarietid- 
skrift, Haft 3—4 (1949), 119-34. 

[14] Ipsen, Johannes, Jr., “A practical method of estimating the mean and 
standard deviation of truncated normal distributions,” Human Biology, 21 
(1949), 1-16. 

[15] Moore, P. G., “The estimation of the Poisson parameter from a truncated 
distribution,” Biometrika, 39 (1952), 247-51. 

(16) Rider, Paul R., “Truncated Poisson distributions,” Journal of the American 
Statistical Association, 48 (1953), 826-30. 

[17] Stevens, W. L., “The truncated normal distribution,” Appendix to “The 
calculation of the time-mortality curve” by C. I. Bliss, Annals of Applied 
Biology, 24 (1937), 815-52. 





RESTRICTION AND SELECTION IN SAMPLES FROM 
BIVARIATE NORMAL DISTRIBUTIONS* 


A. CuirrorpD CoHEN, JR. 
The University of Georgia 


1. INTRODUCTION AND SUMMARY 


HE problem considered here is that of estimating parameters of a 
bivariate normal population with probability density (frequency) 
function 


f(x,y) = 


=] z—m;\* z—m:z\ (y—m, y—m\? 
@) sod 2(1 — p?) ( Oz ) -20( Oz )( Oy ) * ( Oy ) | 
2ro.yV1—p? 


from restricted samples of types which arise when acceptance or screen- 
ing procedures based on one variate, eliminate certain sample speci- 
mens from further observation with respect to the other variate. This 
problem is important in connection with correlation studies which 
relate entrance examination scores to subsequent achievement scores 
and in numerous similar situations. In analyses of acceptance inspec- 
tion data, it becomes important when a physical characteristic such as 
weight, density, size, hardness, etc., must be correlated with a per- 
formance characteristic such as life span, operating cost, sales volume or 
other characteristic for which observations are available only on ac- 
cepted items. 

Without loss of generality, we let x be the restricted variate, and in 
order to emphasize the dependence of y on x, we write f(z, y) as the 
product of the marginal frequency function of x and the conditional 
(array) frequency function of y. Thus (1) becomes 


1 fz —m,}? 
5 
o,V/2e 
1 pene nay] 











’ 





f(, y) = 


exp - — 
2 





o 





oV/ 2a 
where 


(3) B= poy/oz, a=m, — B(m.z = Z), a= oy7(1 T p’), 





* Sponsored by the Office of Ordnance Research, U. S. Army, under contract DA-01-009-ORD-288. 


884 





RESTRICTION AND SELECTION IN SAMPLES 885 


and for a given sample with n accepted specimens, ¢ is the z-mean of 
the accepted specimens only (¢=2,"z,/n). The introduction of Z into 
(2) is for the purpose of insuring the mutual independence of estimates 
of a, 8, and o as is noted in Section 3. Regardless of the value of z, 
the corresponding y-array distribution is normal with standard devia- 
tion, ¢, and with mean, a+6(r—2). 

The method of maximum likelihood is employed to estimate parame- 
ters of (2), and asymptotic variances and covariances are obtained. 
Estimators (estimates) of my, cy, and p are then obtained from (3). 
To demonstrate the practical application of results obtained here, 
an illustrative example is included. 

Results somewhat closely related to the problem considered here 
have been previously published by various authors including Pearson 
(9], Aitken [1], Wilks [11], Lawly [8], Birnbaum [2], Birnbaum, 
Paulsen and Andrews [3], Campbell [4], Des Raj [10], and the 
writer [5, 6]. 


2. MAXIMUM LIKELIHOOD ESTIMATION 


Consider the random selection of N sample specimens from a popu- 
lation distributed according to (1). Each of these specimens is either 
measured or censored with respect to z, or is entirely eliminated from 
observation. Of the specimens measured with respect to z, those which . 
meet acceptance criteria are also measured with respect to y. Accept- 
ance criteria may or may not be the same as those which govern ob- 
servation with respect to x. Thus, each specimen measured with re- 
spect to y is also measured with respect to x, but the converse of this 
statement is not necessarily true. We let n(<N) designate the number 
of “accepted” sample specimens and therefore the number of paired 
observations (z, y). Using (2), the likelihood function for a sample of 
this type may be put into the form 


12 hex “\" 
ep - > > (=—*) 
2051 


Cz 


(o../2m)” 
abe ‘ ab — a — Bs - = 


a 
(o/2n)” 


where k is a constant and G(m,, oz) is a restriction function which 
depends on the restrictions imposed with respect to observation of z, 





P = kG(mz, cz) 
(4) 





x 


= 








886 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


and on the acceptance criteria. The function G(m., oz) also may de. 
pend on the restricted values of z. For an unrestricted (complete) 
sample, 


(5) G(m,, oz) = 1, with n = N. 


To estimate m., ¢2, a, 8, and o, we take logarithms of (4), differenti. 
ate and equate to zero. With L written for log P, we thereby obtain 


+— = 0, 


oL :— ") 1 0G 
G dm, 


om; 


oL ;- 2 n 1 0G 


sm oes = 0, 


0c, o. G 20a; 


oL 

onananiiine P ‘ Z) ]/o? = 0, 
0a 
OL 
0g 
dL 
dco 


#)[z:—2]/o? = 0, 


The last three equations of (6) are independent of m, and oz, and are 
therefore unaffected by G(mz, oz). Consequently, regardless of the 
acceptance criteria or of the type of restriction imposed on observation 
of z, these are the familiar “normal” equations, and have the well- 
known solutions 


(7) B =73,/3 @&@=%9, and ¢ = 3/1 — #, 


where ( ) serves to distinguish maximum likelihood estimates from 
parameters estimated and the bars (—) indicate that‘,the statistics 
thus designated are computed solely from observations made on the 
n accepted sample specimens. Accordingly, = )-tz./n, j= >-*y;/n, 
s2= D(2i—F)?/n, 32= Di(yi—9)*/n, and F= D°%(2i—#)(ys—5) 
/n3,3,. Estimators (estimates) m,, ¢,, and p follow from (3) and (7) as 


My = &+ B(m, — 2), 
(8) dy = Ve? + 6.°B?, 
p = 6.8/Ve? + 6.762, 





2renti- 
obtain 


RESTRICTION AND SELECTION IN SAMPLES 
and in the equivalent forms 
My = 9 — #(5y/32)(Z — ms), 
(9) dy = 5V[1— dl — #))/(1 — 2), 


-~ 








#//1 — X(1 — #7), 


where = 1 —5,.7/é,7. 

Estimators given in (9) above, were obtained earlier in reference [6] 
by setting up the likelihood function directly from (1) rather than from 
(2) as has been done here. It was then maximized with respect to 
Mz, Tz, My, sy, and p without the introduction of a, 8, and o. Similar 
results were obtained independently at about the same time by Des 
Raj [10]. 

The first two equations of (6) are simply the estimating equations 
for restricted samples from univariate normal populations in a form 
that differs only slightly from corresponding equations given in refer- 
ence [5]. In practice, therefore, it is merely necessary that parameters 
of the z-marginal distribution be estimated as in the univariate cases. 
With m, and ¢, thus determined, m,, ¢,, and p follow from (7) and (8) 
or from (9). 

Singly Truncated Samples. When the sampling procedure is such that 
selection is continued until n specimens for which x22 have been 
measured with respect to z, accepted, and subsequently measured with 
respect to 1;, and when moreover, it is not possible to observe specimens 
for which «<2, so that the number of eliminated observations is un- 
known, the sample is said to be singly truncated on the left at terminal 
2, With respect to the z-marginal distribution. The restriction function 
for a sample of this type is 


(10) G(m., o2)1=%[10(€) |, 
where 


(11) Jo(é) = J, eons & = (xo — m:)/oz, O(t) = (V2x)-! exp —17/2, 


As is well known (c.f. [5]), estimating equations for m, and és, 
in this case, become 


a. [1 — &(Z — &)]/(Z — 8? — w/v? = 0, 
(12) . 6,=n/(% — 8), 


Cc. Mz = Zo _ 62&, 





888 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


where Z(£) =$(£)/To(é) and m= >.%(a:—a0)*/n. With » and 1, calcu. 
lated from sample data, — is obtained as the solution of (12a) and 
é, and m, follow from (12b) and (12c). Tables of the first member of 
(12a) (multiplied by 4) and of 1/(Z—£), which are available in refer. 
ence [7], greatly facilitate computation of these estimates. 

Singly Censored Samples. When the sampling procedure is such that 
N sample specimens are selected from a population distributed ac- 
cording to (1), and measurements of both z and y are recorded if r2>1, 
while a count is kept of censored specimens for which z<2o, although 
no further measurements are made, the resulting sample is of the singly 
censored type. The z-marginal sample data thus consist of n measured 
observations for which x22%o, and the information that n,=(N-—n) 
sample specimens were selected for which r<2. The estimating equa- 
tions in this case differ from (12) for the truncated case only in that 
Z(é) in (12) is replaced with Y(#) where 


(13) Y(é) = (m/n) {o(€)/[1 — To(é)]} = (m/n)Z(—-8). 


Selected Samples. When the sampling procedure is such that full 
measurement is made and recorded with respect to x for each of the 
N sample specimens although corresponding y values are determined 
only for the n accepted specimens, the restriction function can be ex- 
pressed as 


N-n 


ae 1 
(14) G(mz, oz) = (o2\/2x)"-" exp — x > [(xs — mz)/oz]?. 


In this case, the first two equations of (6) become 
(15) 


From these, we obtain the usual estimates 





N 
(16) m, = fy = > z./N, $= 8 = y/ > (1; — #y)?/N, 


where Zy is the z-mean of the entire N sample observations and is to be 
distinguished from # which is used elsewhere in this paper to designate 
the mean of the n accepted sample specimens. 





RESTRICTION AND SELECTION IN SAMPLES 889 


Doubly Restricted Samples. When a double restriction is imposed on 
observation of z at terminals x and z+, such that measurements are 
limited to specimens for which z»>S2S20+, the z-marginal samples 
are doubly truncated or doubly censored according to whether the 
number of unmeasured specimens is unknown or known. With samples 
of these types, the univariate estimating equations given in reference 
[5] are applicable for determining m, and é.. 


3. RELIABILITY OF ESTIMATES 


The variance-covariance matrix of (mz, 2, &, 8, ¢) can be derived 
from second order derivatives of L. Differentiating (6), we obtain 
o7L n 1 @&¢ 1 0G\? 
am,? ot G (Om? : a 
aL 22 (= - ™) 1 0G 


am00. o rs G am.do, 


1 ~~) (= | 
G am, G dc,/' 


3 o(2—™)' + n ue eG 


os? 1 Cz oz” G dc,” 


Cz 


(- ~—) 
G da; j 


c= 
ee ee 


o 





aL ah ML 
"amide ' deda | 36,08 


? 








890 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10945 


To determine expected values of the terms of (17), we require the fol- 
lowing results: 


7 ol 


o 





1 @ 
oo J owas f __vo (dv = 0, 


e[? — @ — A(z - af 
o 
(18) 





1 eo 
— J owas f v6 (Odo = 1, 


# (Ce - 9] 


~ tien | cette a 
fc 2)6(u)du co » = 0, 





1 
 -E(n/N) 
where u=(x—m,)/oz, v= [y—a—B(x—#)]/o, and R designates the 
acceptance interval with respect to x. Using these results and taking 


expected values of the derivatives of (17), the non-vanishing terms 
become 


#7 r aL 7 
_R|. open, +s 


| dm,*_] . | dm,0e2_] 
r PL . r PL 4 
L Oo2? J | da? J 
rePL] «¢ r SL 


on tS wamsines ) 
| OB? 2 q 002 P 














Expected values of all derivatives of (17) not shown in (19) are zeros. 
We take note of the fact that 0°L/ da08 =0, d°L/dadc =0, and d*L,/agde 
=0, as a consequence of the manner in which Z was introduced into 
equation (2). We have written n* to denote the number of observations 
which resulted in measurements of z. In the truncated and censored 
samples considered here, n* =n, but in the selected samples, n* = N. The 
¢:; depend on the restrictions imposed with respect to observation of z 
and are available from results given in [5]. For singly truncated and 
singly censored samples, they are as follows: 





RESTRICTION AND SELECTION IN SAMPLES 891 
Truncated Samples Censored Samples 
gu =1—Z(8)[Z()-€)], ou =1+ Z()[2(-8 + €], 
(20) drs = Z(E){1 — e[Z(é) — é]}, ou = Z(e) {1+ e[Z(-8) + €]}, 
go = 2 + Edie, ge = = 2+ fdr. 


In (19) and subsequently, ¢,?= £(5,?). For singly truncated and singly 
censored samples, 


(21) é.* = o,*[1 — Z(Z — &)]. 
Now inverting the information matrix whose non-zero elements are 


given by (19), we find the non-zero elements of the asymptotic variance- 
covariance matrix to be 


V(a) ~ 0*/E(n), V(8) ~ 0?/a*E(n), V(é) ~ 0?/2E(n), 
V(mz) ~ [o.*/E(n*) | [$22/(dudee — o2°*)], 

V(é2) ~ [o2*/E(n*) |[¢u/(dude — $2?) J, 
Cov (mz, 62) ~ [6.2/E(n*) ][—¢12/(¢ude2 — $127) ]. 


(22) 


In the singly restricted cases, E(n*) = E(n) = NI (£), and in the doubly 
restricted cases, E(n*) = E(n) = N [Io(£:) —Io(&) |. When the sampling 
procedure is such that n* =n is fixed, then E(n*) = E(n) =n, and (22) 


is also applicable in that case. 

For a selected sample, ¢u1=1, ¢i12=0, and ¢22=2, since each of the 
N sample speciments is observed and measured without restriction with 
respect to z. Thus with E(n*)=N, it follows from (22), that for a 
sample of this type 


(23) V(mz) = o3;°/N, Cov (mz,é:)=0, V(éz) = o,?/2N. 


4, AN ILLUSTRATIVE EXAMPLE 


To illustrate the practical application of results obtained in this 
paper, we employ a sample consisting of entrance examination scores, 
z, and subsequent course averages, y, achieved by a group of 529 
college students.! For purposes of this illustration, we assume the mini- 
mum qualifying score on the entrance examination to be 159.5. Under 
this requirement, there are 517 candidates which we consider as being 
accepted for admission. By making appropriate further assumptions, 
we use the same basic data to illustrate estimation from selected, trun- 
cated and censored samples. For a comparison, estimates are also com- . 





' Given by Goedicke, “Introduction to the Theory of Statistics,” New York: Harper and Brothers, 
1953, 175-77. 





892 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


puted from the full (unrestricted) sample. Except for the truncated 
sample, we consider N as being fixed and m as a random variable. 
By the definition of a truncated sample, the total sample size, N, is 
unknown, and in that case, we consider n as fixed. Basic data for the 
sample selected is summarized as follows: N =529, n=517, x) =159.5, 
€= 164.5919, Zy=164.441, s2=2.847, 5,=2.7036, $=77.5048, jy 
= 77.380, sy=5.405, 5,=5.3470, F=0.3892, r=0.409, where Zy, jy, 
Sz, S,, and r are based on the full sample with N =529, while Z, j, 3,, 
$,, and # are based only on observations of the n =517 accepted candi- 
dates. From these data, estimates were computed for the different 
samples, using (7), (8), (12), and (16) as applicable. Estimates of the 
elements of the variance-covariance matrix were formed by using the 
maximum likelihood estimates of the parameters in (22). Results of 
these computations are summarized in the accompanying table. For 
the full sample, since n= N, estimates @, 8, and & were computed from 
(7) with 9, 5., 5,, and # replaced by Jw, 7, 82, and sy, while m, and é, 
were computed from (16). Variances of the full sample estimates were 
computed using (22) with ¢2—s/7, ¢u=1, ¢2=0, d2=2, and E(n*) 
=E(n)=N. 


TABLE OF ESTIMATES AND THEIR VARIANCES 








Type of Sample 





Parameters 
Complete Selected Censored Truncated 





164.441 164.441 164.451 164.191 
2.847 2.847 2.834 3.058 
77 .380 77 .5048 77 .5048 77 .5048 
0.7765 0.7697 0.7697 0.7697 
4.9307 4.9254 4.9254 4.9254 
77 .380 77.389 77 .396 77.196 
5.405 5.391 5.387 5.459 
0.409 0.406 0.405 0.431 
—1.736 —1.747 —1.534 
E(n) 507.162 507 .671 517 (fixed) 

















Variance 





V(mz) 
V(6e) 

Cov (mz, 6. s) 
V(Q) 

v(6) 

V(é) 




















RESTRICTION AND SELECTION IN SAMPLES 893 


Considering the degree of approximation involved in their calcula- 
tion, the tabulated variances reflect the varying amounts of informa- 
tion provided in the different types of samples, minimum information 
being contained in the truncated sample, and maximum information 
in the complete sample. The calculated variances are approximate not 
only due to using asymptotic values, but also due to their dependence 
on sample values. In comparing the truncated sample variances with 
those for the other samples, allowance should be made for the effective 
differences in sample sizes. The truncated sample with n=517 (fixed) 
corresponds to a cexsored or selected sample of total sample size, 
N=551. 


REFERENCES 


[1] Aitken, A. C., “Note on selection from a multivariate normal population,” 
Proceedings of the Edinturgh Mathematical Society, 4 (1934), 106-10. 

[2] Birnbaum, Z. W., “Effect of linear truncation on a multi-normal popula- 
tion,” Annals of Mathematical Statistics, 21 (1950), 272-79. 

[3] Birnbaum, Z. W., Paulson, E., and Andrews, F. C., “On the effect of selec- 
tion performed on some coordinates of a multi-dimensional population,” 
Psychometrika, 15 (1950), 191-204. 

[4] Campbell, Francis L., “A study of truncated bivariate normal distribu- 
tions,” Doctoral Dissertation, University of Michigan, June 1945. 

[5] Cohen, A. C., Jr., “Estimating the mean and variance of normal popula- 
tions from singly truncated and doubly truncated samples,” Annals of 
Mathematical Statistics, 21 (1950), 557-69. 

[6] Cohen, A. C., Jr., “Estimation in truncated bivariate normal distributions,” 
Department of Mathematics, University of Georgia, Technica] Report 
No. 2. Contract DA-01-009-ORD-288 (June 1953). 

[7] Cohen, A. C., Jr., and Woodward, John, “Tables of Pearson-Lee-Fisher 
functions of singly truncated normal distributions,” Biometrics, 9 (1953), 
489-97. 

[8] Lawly, D. N., “A note on Karl Pearson’s selection formulae,” Proceedings 
of the Royal Society, Edinburgh, A., 66 (1943), 28-30. 

[9] Pearson, Karl, “On the influence of natural selection on the variability and 
correlation of organs,” Philosophical Transactions of the Royal Society, 
London, A, 200 (1903), 1-66. 

{10} Raj, Des, “On estimating the parameters of bivariate normal populations 
from doubly and singly, linearly truncated samples,” Sankhya, 12 (1953), 
277-90. 

[11] Wilks, S. S., “On estimates from fragmentary data,” Annals of Mathematical 
Statistics, 3 (1932), 163-96. 





COMPARISON OF SOME NON-PARAMETRIC TESTS 
AGAINST NORMAL ALTERNATIVES WITH AN 
APPLICATION TO LIFE TESTING* 


BENJAMIN EpstTgIN 
Wayne University 


1. INTRODUCTION 


ONSIDER two normal populations N(41,07) and N(u2,07) with com- 

mon variance o? (which can be assumed equal to one for conveni- 
ence) and with means y; and yz respectively. In this paper we test the 
composite hypothesis Ho:y:=2 against the composite alternative 
Hi: 4,2 on the basis of samples of size ten drawn from each popula- 
tion. The performance and relative merits of four non-parametric test 
procedures are studied experimentally. There is a close connection 
between this paper and recent work by Dixon and Teichroew [1]. 
There is, however, an important difference due to the fact that we as- 
sume that the two samples in question are placed on life test, so that 
information about failures becomes available in an ordered way.' Two 
of the non-parametric tests considered, the rank sum and run tests, 
require essentially that we have information about the times to failure 
of all items in both samples before we can reach a decision. The other 
two non-parametric tests, the exceedance and truncated maximum 
deviation tests, make essential use of the (time) ordered way in which 
failure data become available and thus make it possible to reach a de- 
cision long before all items fail. The experimental sampling carried out 
in the course of this work gives a good idea of the comparative power 
of the four test procedures and indicates how much one can expect 
to save on the average in the number of items failed in the course of 
reaching a decision. 


2. TESTS CONSIDERED IN THIS PAPER 


The current study is based on 200 experiments where each experi- 
ment consists of (a) drawing a sample of ten items at random from each 
of two standardized normal populations, (b) placing the two samples on 
life test, (c) applying each of four rules of action considered below. In 
order to make the various rules of action comparable, we have, when 





* Research sponsored by the Office of Ordnance Research of the U. 8S. Army under Contract No. 
DA-20-018-ORD-13272. 

The author wishes to thank C. K. Tsao for his comments on this paper. 

1 For example, failures might be ordered in time. We find it convenient in this paper to think of the 
observations as being ordered in this way. 


894 





COMPARISON OF SOME NON-PARAMETRIC TESTS 895 


necessary, used randomized rules of action so as to make the Type I 
error the same for all test criteria. The particular Type I error chosen 
in this paper was a=.05. The difference between the two populations is 
measured by the dimensionless parameter d= | pes — po! /o. 

Let 1:<2%2< +--+ <2o be one such ordered sample and let y:<y2 
< +++ <4o be another such ordered sample. Let z;<zz - - - <220 be the 
ordering of the combined sample of twenty. Then the rules of action 
for the four criteria studied here are: 


(a) The rank sum criterion S: 


Reject Ho if the smaller (denoted as S) of the two rank sums is less 
than or equal to 79. Ho is accepted otherwise. See Wilcoxon [8]. 


(b) The run test criterion R: 


Reject H, if the total number of runs (denoted as R) is less than or 
equal to 6. Accept Ho if R is greater than or equal to 8. When R=7, 
perform a Bernoulli trial which accepts H» with probability .04 (and 
rejects H» with probability .96). See Swed and Eisenhart [6]. 


(c) The exceedance criterion E,: 


Let z, and y, be respectively the rth smallest observations in each of 
the two samples and let w,=max(z,,y,). If w, =x, count the number of 
y’s which are 22,; if w,-=y, count the number of z’s which are 2y,. 
Denoting this number of exceedances as E,, the test procedure is as 
follows: Reject Hy if E,Sn,—1; accept Hy if E,2n,+1. When E,=n, 
perform a single Bernoulli trial which rejects H» with probability p,. 
In the present study attention was limited to the cases were r=1, 2, 3. 
The appropriate n, and p, are given in Table 1. 


TABLE 1 








Ny Dr 





6 .32 
4 81 
3 .58 





The theory underlying test procedures of the exceedance type is given 
in [3]. 

It may be useful to restate the exceedance rule in the language of life 
testing. If, for example, r= 1, the rule is 





896 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


(i) The life test is terminated by time 2, at the latest (zs is the time 
of occurrence of the fifth failure in both samples combined). 

(ii) As soon as at least one failure in each population is observed, 
terminate the life test with acceptance of Ho provided that this 
event occurs at time z or earlier (2, or 2s). 

(iii) If the first four failures all come from one of the two populations 
and if the fifth failure comes from the other one of the two popu- 
lations, stop the test at zs. Perform a Bernoulli trial which has 
only two possible outcomes A and A with Pr(A)=.32 and 
Pr(A) =.68. If a trial is made and A occurs, reject Ho; if A oc- 
curs, accept Hp. 

(iv) If the first 5 failures all come from the same population, stop 
testing and reject Ho. 


(d) The maximum deviation criterion M,: 


This is a truncated maximum deviation test [7] with the truncation 
taking place at a time not later than u,=max(z,,y,); 7 is decided upon 
in advance. For a given r, the test procedure reads as follows: Keep 
track of M,, the absolute difference between the number of z failures 
and the number of y failures and, (i) if at any time up to and including 
U, =Max(Z;,yr),M,=m,+1 stop experimentation with the rejection of 
H>. Only the decision to reject can actually be made before time 1,; 
(ii) if the test is actually carried to time u, and if M,=m, at least once 
(and is otherwise <m,—1), then perform a Bernoulli trial which 
rejects Hy with probability p,; (iii) if at all times up to and including 
ur, M,Sm,—1, accept Ho. The values of M, and p, for r=1, 3, 6, 10 
are given in Table 2. 


TABLE 2 








Mr 








If r=1, we have the exceedance case (with r=1). If r=10, we have the 
untruncated maximum deviation test of the type considered by 
Smirnov [5] and Massey [4]. 

It seems useful to illustrate the four test criteria by means of an 
example. Let the first sample (7:1<2:< --- <2:9) be (—.79, —.42, 





COMPARISON OF SOME NON-PARAMETRIC TESTS 897 


—39, —.29, —.04, .08, .13, .22, .92, 1.53) and let the second sample 
(n<y2< +++ <Yso) be (—1.10, -.89, —.55, —.34, —.31, —.26, .35, 
75, 1.56, 1.91). The rearrangement of the combined sample into 
a<@a< °° <Zo is then (—1.10, —.89, —.79, —.55, —.42, —.39, 
-.34, —.31, —.29, —.26, —.04, .08, .13, .22, .35, .75, .92, 1.53, 1.56, 
1.91). Criteria (a), (b), (c), and (d) work out as follows on this example: 
(a) In the combined sample of 20, the sum of the ranks of the z’s 
is 108 and the sum of the ranks of the y’s is 102. Thus Hp is 
accepted since 102, the smaller of the two rank sums exceeds 79. 
(b) To apply the run test we write the combined ordered sample as 
yyryrryyxycccryyrcyy. The total number of runs equals 11 and 
so Ho is accepted. 
(c) Exceedance criteria E,(r=1, 2, 3) all lead to the acceptance of 
H, after 3, 5, and 6 observations respectively. 
(d) The maximum deviation criteria M, (r=1, 3, 6, 10) all lead to the 
acceptance of H, after 3, 6, 12, and 20 observations respectively. 


3. COMPARISON OF THE POWER OF THE FOUR TEST CRITERIA 


In the following table we summarize the experimental findings for the 
200 pairs of samples, where each sample is of size ten. These samples 
were drawn from Wold’s tables of random normal deviates [9]. Samples 
corresponding to the cases where d= | (1 — u2)/o| =1, 2, 3 were ob- 


TABLE 3 


OBSERVED PROBABILITY OF ACCEPTING H,(d=0) BASED ON 
200 PAIRS OF SAMPLES, EACH OF SIZE TEN 








Maximum 


Exceedance Deviation 


Rank 
Sum 





r=l1 | r=2 r=3 | r=3 | r=6 |r=10 





935 965 | .95 -96 -96 .955 | .945 | .945 

485 795 | .655 | .65 -60 | .575 | .555 | .555 

.015 .275 16 .12 -10 | .065 | .045 | .045 
0 .02 .025 0 0 0 0 0 





























tained by adding 1, 2, and 3 to the z,’s and leaving the y,’s unchanged. 
This was done for convenience. It might have been better to have used 
200 different pairs of samples for each of the four values of d. The 
author doubts, however, that this would make any appreciable change 
in the overall pattern presented by Table 3. 





898 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193 


The following remarks are pertinent: 


(i) As r increases, there appears to be a slight improvement in the 
power of exceedance and maximum deviation tests. It happens 
that the truncated maximum deviation test for r=6 has the 
same experimental O.C. curve as the untruncated maximum 
deviation test for the particular samples being reported on in 
this paper. 

(ii) The maximum deviation test has slightly better power than 
the exceedance test for the particular samples being reported on 
in this paper. 

(iii) Ranked in order of power we have: Rank sum, best; run test, 
worst; exceedance and maximum deviation tests in between. 


In order to be able to make more positive and more general state- 
ments, particularly in (i) and (ii), we would need much more in the way 
of experimental evidence. To settle the question completely awaits the 
theoretical treatment of what seems to be a complicated analytical 
problem. 


4. AVERAGE SAMPLE SIZES 


It is of interest to report how many observations were required on 
the average to reach a decision if one adopts a decision rule E, ora 
decision rule M,. In Table 4 we give data on rules E,, E2, and E; and 
in Table 5 we give data on M3. 


TABLE 4 


AVERAGE NUMBER OF ITEMS FAILED IN REACHING A 
DECISION IF EXCEEDANCE RULE £, IS USED. THE 
AVERAGES GIVEN ARE BASED ON WHAT WAS OB- 

SERVED IN 200 EXPERIMENTS, EACH EXPERI- 
MENT CONSISTING OF A PAIR OF SAMPLES 
EACH OF SIZE TEN 


























COMPARISON OF SOME NON-PARAMETRIC TESTS 


TABLE 5 


AVERAGE NUMBER OF ITEMS FAILED IN REACHING A 
DECISION IF TRUNCATED MAXIMUM DEVIATION 
RULE M, IS USED 








Average Number (Based On 
200 Experiments) 











Tables 4 and 5 indicate clearly why exceedance and truncated 
maximum deviation procedures should be given serious consideration, 
if data become available in an ordered way, as they do in life testing. 
One may well be willing to sacrifice some power, if this means that one 
can in this way attain a substantial reduction in the number of items 
failed in the course of reaching a decision. Of course one should bear 
in mind that while the tests considered here will be reasonably effective 
in detecting whether the two distributions f(z) and g(y) differ in loca- 
tion, they may be quite insensitive to other important differences. This 
is an inherent difficulty in all non-parametric procedures and this is 
why the choice of a non-parametric procedure should be based at least 
in part on what we know about the underlying distributions in an a 
priori way or from data collected in the past. 


5. CONCLUSION 


We have presented the results of a sampling experiment of moderate 
size to indicate the possible usefulness of using exceedance or truncated 
maximum deviation tests when the data from each of two samples 
become available in an ordered way. The underlying distributions are 
assumed to be normal. While the experimental results in this paper are 
reported only for the case where the common sample size is ten, it is 
safe to conjecture that similar results would be found for other sample 
sizes. 

BIBLIOGRAPHY 
1. Dixon, W. J., and Teichroew, D., “Some sampling results on the power of 
nonparametric tests against normal alternatives,” Abstract, Annals of Mathe- 

matical Statistics, 25 (1954), 175. 





900 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


. Dixon, W. J., “Power under normality of several nonparametric tests,” 
Annals of Mathematical Statistics, 25 (1954), 610-14. 

. Epstein, B., “Tables for the distribution of the number of exceedances,” 
Annals of Mathematical Statistics, 25 (1954), 762-8. 

. Massey, F. J., Jr., “The distribution of the maximum deviation between two 
cumulative step functions,” Annals of Mathematical Statistics, 22 (1951), 
125-8. 

. Smirnov, N., “Tables for estimating the goodness of fit of empirical distriby. 
tions,” Annals of Mathematical Statistics, 19 (1948), 279-81. 

. Swed, F. S., and Eisenhart, C., “Tables for testing randomness of grouping 
in a sequence of alternatives,” Annals of Mathematical Statistics, 14 (1943), 
66-87. 

. Tsao, C. K., “An extension of Massey’s distribution of the maximum devia- 
tion of two cumulative step functions,” Annals of Mathematical Statistics, 25 
(1954), 587-92. 

. Wilcoxon, F., “Probability tables for individual comparisons by ranking 
methods,” Biometrics, 3 (1947), 119-22. 

. Wold, H., “Random normal deviates,” Tracts for Computers No. 25, Cam- 
bridge University Press (1948). 





ON THE DISTRIBUTION OF A POSITIVE RANDOM 
VARIABLE HAVING A DISCRETE PROBABILITY 
MASS AT THE ORIGIN* 


JoHN AITCHISON 
University of Cambridge 


In a number of situations we are faced with the problem of 
determining efficient estimates of the mean and variance of a 
distribution specified by (i) a non-zero probability that the 
variable assumes a zero value, together with (ii) a conditional 
distribution for the positive values of the variable. This esti- 
mation problem is analyzed and its implications for the Pear- 
son type III, exponential, lognormal] and Poisson series con- 
ditional distributions are investigated. Two simple examples 
are given. 


1. THE PROBLEM 


HE nature of the problem to be discussed is best introduced by 
Taam. In a study of household expenditures it is often of inter- 
est to estimate, from a sample of household budgets, the mean expendi- 
ture per household on a certain commodity, say children’s clothing. 
Over the period of the investigation it may well happen that a number 
of households in the sample spend nothing on children’s clothing where- 
as the expenditures by the remainder of the households necessarily 
arise from the distribution of a positive variable, probably skew and 
possibly approximated by a lognormal curve. If such is the case, then 
clearly the correct procedure in any analysis is to recognize explicitly 
this dichotomy of the population into the categories, spender and non- 
spender. This type of situation is not, however, confined to the case of 
a continuous variable; it occurs also for discrete variables. For exam- 
ple, in a household composition study we may wish to investigate the 
distribution of the number of children in a household. This distribution 
is sometimes Poisson except that the number of households with no 
children is considerably larger than is suggested by Poisson theory. 
Again one solution of the difficulty is to assume that there is a propor- 
tion of households containing no children while the remainder is dis- 
tributed as a truncated Poisson distribution. 

Such problems lead us to consider a random variable X with the 
following properties. There is a non-zero probability @ that X is zero 





* This paper is a development of some of the estimation problems discussed by Utting and Cole [5]. 
The author wishes to express his indebtedness to J. A. C. Brown of the Department of Applied Econom- 
ics for helpful criticiam and for suggesting the application of the Poisson series distribution to the analysis 
of household composition. 


901 





902 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


and hence a probability 1—6, that X is non-zero; further the distriby- 
tion of X conditional on X¥O is some well-known distribution of g 
positive variable, either continuous or discrete. This we may write: 


P{X =0} =8, 
P{xX >0} =1-8, 
and, for the continuous case. 
P{X C (2,2 +dz)| x > 0} = g(x)dz, 
where g(x) is the conditional frequency function; and so 
P{X C (z, 2 + dz)} = (1 — @)g(zx)dz, x > 0. 
If a and @ are the mean and variance respectively of the g(x) distribu- 
tion and y and 6 are the corresponding parameters of X then 
y = (1 — 4a (5) 
and‘ 
§ = (1 — 6)8 + (1 — A)a?. (6) 
We discuss in this paper the problem of efficient estimation of 
and 6. 
2. EFFICIENT ESTIMATION 


In this section we state some general results proved in the Appendix 
which allow us, under certain circumstances, to obtain best unbiased 
estimators of y and 5; by the term best unbiased estimator we mean an 
unbiased estimator having minimum attainable variance (see, for 
example, Rao [3]). Let us suppose that, for the purpose of estimation, 
we have available a random sample S of size n from the population and 
that r of the sample values are zero while the remaining (n—r) are 
Zi, T2, * + +, Ln». Then the following results hold. 

(i) If, for a sample of size m from the g(x) population a sufficient 

unbiased estimator of a, say Qim), exists then 


r 
oo (1 _ “Yawn r<n, 
n 


=0,r=n, (7) 


is a best unbiased estimator of . 
The twofold definition of cis necessary since d;»-+) is not defined for 





DISTRIBUTION OF A POSITIVE RANDOM VARIABLE 903 


r=n. If Gum) is the arithmetic mean of m sample values then c becomes 
the mean of the sample S (including zero values), namely 


(8) 


and the variance of the estimator in this case is 


var {c} --. (9) 


A result similar to (i) holds for 6 provided that jointly sufficient 
estimators of a (and hence of a?) and £ exist. 
(ii) If €¢my) and fim) are jointly sufficient unbiased estimtaors of a? 
and 8 respectively for a sample of size m, then 


r Tr r—l 
d={1- ~Y fuer + =( basi em r<n, 
n n n—-1 


=0, r=n, (10) 


is a best unbiased estimator of 6. 
It is seldom that such jointly efficient estimators of a and @ occur. 
Often, however, 6 depends on a so that aim) is sufficient for both a and 
8; an important case is 8=Ka*, for which we have the following 


property. 
(iii) If B= Ka? and aim), the sufficient unbiased estimator of a, is the 
sample mean then 


5 = (1 — 0)(K + 4)a® (11) 


and 


r  r(r—1) K 
n n(n—1) n—r 
(12) 
is a best unbiased estimator of 6. 


3. APPLICATION TO PARTICULAR DISTRIBUTIONS WITH EXAMPLES 


It is interesting to apply the estimator procedure of the preceding 
section to a number of particular conditional distributions. 





904 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19% 


(i) Pearson type III distribution 
For the Pearson type III distribution 


p\? z?—le—p2z/a 
g(x) = (2) TZ? z>0, (13) 


where we assume p to be known. For the mean a of this distribution 
the sample mean is a sufficient unbiased estimator so that 


(14) 


is a best unbiased estimator of y=(1—@)a. Here B=a*/p and so 


6 = (1 — 8) (— + ) a?, (15) 


and a best unbiased estimator of 6 is given by (12) with K=1/p. 
(ii) Exponential distribution 
For the exponential distribution, 
1 
g(x) = —e*/*, 2>0, (16) 
a 


and this is the special case p=1 of the Pearson type III distribution 
so that no new theory arises. The estimator of 5, however, simplifies to 


ne eat {Fah 


n i=l 





r<n, 
' r—1 n(n —1) 
n 


=0,r=n. (17) 


(iii) Lognormal distribution 


If the conditional] distribution is lognormal with parameters yu and 
o” then 


1 1 
g(z) = Ue exp | 5 (log z—yp)?},2>0. (18) 


Here 
a = ettic P (19) 
B = e%+e"(e** — 1), (20) 
y = (1 — Oentie’, (21) 


5 = (1 — 6)-e*+"{er* — (1 — 6)}. (22) 


so that 
so that 


and 





pISTRIBUTION OF A POSITIVE RANDOM VARIABLE 905 


Finney [2] has obtained by considerations similar to those of this paper 
best unbiased estimators of a and #, and as an extension of his theory 
it may be shown that 


ee ( aa ~) eH} a-n(de, r<n-—1, 


TZ 
=? r=n-—l1, 
n 


= 0, r=, 


r 
(i as “) e”9 {yn (2s 
n 
r n-r—2 
mal ee Ven-r) | ——@_ #* 7, r<n—], 
n—1 n-r-—l 


2? 
s+, r=n-—l1, 
n 


=0,r=n, 
are best unbiased estimators of y and 6 respectively, where 
yi = log a, ¢ = 1,2,--+,n—?7, 


1 a-T 


j= Dywr<n-l, 
a= 7 24 


1 n—r 
Pt nine Bae eee 4, 
—_ —_ t=] 


ap ~14 Bag tae 
il m 2!m*(m + 1) 


(m — 1) e@+--- (28) 
3lm*(m + 1)(m + 3) F 


It can be shown that 1/n > 321 2; is less efficient than c in this case. 
Ezample.2 In a household expenditure inquiry carried out by the 
Ministry of Food in 1950, a sample of 1143 British households was 
1 A table of values of the function y(t) will be published in a forthcoming monograph on the log- 


normal distribution by the Department of Applied Economics, University of Cambridge. 
2 The data for this and the following example have been obtained from The National Food Survey 


by courtesy of the Ministry of Food. 














906 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 


taken; one of the items in the classification gives expenditures in pence 
per household on sweet biscuits. Of the 1143 households only 519 
bought this commodity and their expenditures appear to come from a 
lognormal population. The relevant sample data are: 


n = 1143 5 = 3.664 
r= 631 s? = 0.3721 


The estimator c of y obtained from (23) is then 
c= (1 ) e 4 612)(0. 1860) 
1143 
= 20.95, 


as compared with the value of 20.25 given by the ordinary sample mean, 
The value of d in this case is 4481 so that the estimated standard error 
of the sample mean is 1.97 and the standard error of c is necessarily 
less than this. 


(iv) Truncated Poisson distribution 


The truncated Poisson distribution? provides an application of the 
theory to a discrete variable. In this case 


g(x) = P{x =2|2z>0} 


e* P 
—y z=1,2,--- 
1 —e~ zg! 


ye 
a= ° 30 
<—— (30) 
The sample mean is again a sufficient unbiased estimator‘ of a and 
so c as given by (8) is a best unbiased estimator of -y where 
_ (1 — 6)a 
1—e* 


Y (31) 


It does not seem possible to find a simple expression for the estimator 
d in this case. 





8 See, for example, David and Johnson [1]. 
4 See Tukey (4]. 





DISTRIBUTION OF A POSITIVE RANDOM VARIABLE 907 


Example. The data of Table 1 analyze by number of children (under 
fourteen years of age) per household a sample of 4021 British house- 
holds in 1950. 


TABLE 1 


NUMBER OF HOUSEHOLDS CONTAINING GIVEN 
NUMBER OF CHILDREN 








No. of Children 0 1 2 3 4 





(i) Observed 831 67 
(ii) Poisson 28 
(iii) Truncated 81 

Poisson 



































Rows (ii) and (iii) of the table show the results of fitting a complete 
Poisson distribution (estimated »=0.773) and a truncated Poisson 
distribution (estimated »=1.33) as described by (29). The complete 
Poisson distribution clearly does not give an adequate fit due to the 
extra large proportion of households with no children; the truncated 
distribution gives a better approximation and a best unbiased estimate 
of the mean number of children per household is simply the sample 
mean 0.773. 


4. FURTHER CONSIDERATIONS 


The limitations that the discrete probability mass is at the origin 
rather than at some other point, and that the conditional variable is 
essentially positive, may be removed without unduly complicating 
the theory; we have not thought it worth-while to develop this exten- 
sion because of the lack of any obvious practical application. It would 
also have been interesting to compare the efficiency of other possible 
estimators with the estimators derived for the special distributions but 
this particular problem has also been left aside. 


APPENDIX: DERIVATION OF THEORETICAL RESULTS 


In this Appendix we derive the theoretical results of Section 2. The essential 
idea underlying the proofs is a property of jointly sufficient estimators: any 
function of jointly sufficient estimators is a best unbiased estimator of its expec- 
tation (cf. Rao (3, p. 149]). 

As in Section 2 the random sample S consists of r zero values and (n—r) 
other values 2, - - -, Zn-r. For the g(x) population, am) is a sufficient unbiased 











908 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


estimator of a fora sample of m values. If the distribution of X depends on param- 
eters A, - - - in addition to 6 and a, then the likelihood function L of the sample 
may be written in the form 


L(S | 8, Q, Aes) -(*)ea — 0)""Th(ain-r), @, A, s°° )k(S| Ayer) 


where h is a frequency function containing the sample values only in the form 
Qin-r), and k is a frequency function independent of @ and a. Hence 


L -1,(=. Qin—r), 9, a, A+ ** ) ta(S| a, +++) 


where L, is a frequency function containing the sample values only in the forms 
r/n and ain—r), and Ls is a frequency function independent of @ and a. Thus r/n 
and @(a-r) are jointly sufficient estimators of 6 and a. Consider the sample func- 
tion: 


r 
c= (2 - =) ann r<n 
n 
=O,r=n. 
The expectation of the estimator is given by 
E{c} = P{r =n}E|rn{c} + P{r <n}Elr<a{c} 


=P{r< n} Brircn (1 - =) E |,meonat<n [@(n—r)] 


=P{r< n} Enicn}(1 - =) af 


= a&(1-) 
n 
= (1 — @)a 
-” F 
so that, from the property of sufficient estimators referred to above, c is a best 


unbiased estimator of +. 


The proofs of the results (ii) and (iii) of Section 2 proceed in exactly the same 
manner and their details need not be reproduced here. 


REFERENCES 


[1] David, F. N., and Johnson, N. L., “The truncated Poisson,” Biometrics, 8 
(1952), 275-85. 

[2] Finney, D. J., “On the distribution of a variate whose logarithm is normally 
distributed,” Journal of the Royal Statistical Society, Series B, 7 (1941), 
155-61. 

[3] Rao, C. R., Advanced Statistical Methods in Biometric Research. New York: 
John Wiley and Sons, Inc., 1952. 

[4] Tukey, John W., “Sufficiency, truncation and selection,” Annals of Mathe- 
matical Statistics, 20 (1949), 309-11. 

[5] Utting, J. E. G., and Cole, Dorothy, “Sample surveys for the social accounts 
of the household sector,” Bulletin of the Oxford Institute of Statistics, 15 
(1953), 1-24. 





—E SL ll lO Cee 





orm 


rms 
r/n 


at 





INCREASING THE EFFECTIVE LENGTH OF SHORT 
TIME-SERIES FOR THE PURPOSE OF ESTIMATING 
AUTOREGRESSIVE PARAMETERS* 


Assott S. WEINSTEIN 
New York State Department of Commerce 


Techniques of overcoming biases in serial correlation coef- 
ficients computed from short linear autoregressive time-series 
are considered. Two alternative estimates of autocorrelation 
coefficients are suggested as probable improvements over the 
usual estimates based on averages of observations made over 
a seasonal! period. The alternative estimates are obtained, for 
example, from 12 yearly series, one for each month, and are: 

1. an average of serial coefficients computed from each 
yearly series; 

2. pooled sums of squares and lagged products for each 
series formed into a single serial correlation coefficient. 


tT 18 well known that the application of conventional least squares 
heli analysis to economic time-series is complicated by (1) 
autocorrelated error terms, (2) errors of observation, and (3) simul- 
taneous interrelationships among the variables.! Even in the uncom- 
plicated case where the error term is distributed normally, the maxi- 
mum likelihood solution may lead to estimators that are biased to 
order 1/n, as noted by Kendall [8]. Where only one of the complica- 
tions is present, large-sample methods of overcoming it are available, 
but these are similarly biased for short series. With more than one of 
the complications, the problem is greater and a much larger number of 
observations is essential.? 

In view of the brevity of most economic series, the effect of these 
considerations is especially serious. It is the purpose of this paper to 
describe methods designed to increase the effective number of observa- 
tions of the series, thereby reducing the short-series limitations of 
available analytical techniques. The discussion deals specifically with 
the elimination of bias from serial correlation coefficients used in the 
estimation of autoregressive parameters. 

A series of annual data consists either of annual averages of periodic 
observations, or of one observation per year made at roughly the same 





* The author wishes to express his appreciation to Alfred D. Basch and Ethel E. Metsendorf of the 
New York State Department of Commerce and Joseph Lev of the New York State Department of Civil 
Service for their encouragement and helpful comments. 

1 Cochrane and Orcutt [4]. 

2 Orcutt and Cochrane [12]. 


909 





910 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19%; 


date each year. We consider here series of annual data that can be 
represented usefully by the two-lag autoregressive process, 


(1) Lee + OLin. + OTe = Veze, 


where v; is a random disturbance with E{v,} =0 and the ?’s are equi- 
distant points in time. The specific problem considered is the elimina- 
tion of bias from the serial correlation coefficients used to estimate the 
a’s. 

As shown by Orcutt [11], the serial correlation coefficient is biased in 
two respects: downward as a result of using the sample mean where the 
parent mean is not known, and toward zero because of the skewed 
distribution of the coefficients. Since the first two serial coefficients are 
positive in most economic series, the entire bias in these coefficients 
will usually be downward. In the method of estimating the a’s con- 
sidered below, serial correlation coefficients are used for the first two 
lags only. 

A correction for downward bias has been suggested by Quenouille 
[14]. Based on his conclusion, drawn from an empirical study, that the 
downward bias of a serial correlation coefficient is inversely proportion- 
al to n, Quenouille proposed replacing r, with 


(2) r, (corrected for bias) = 27, — 43(:7, + 27.) 


where r, is the serial correlation coefficient for the sth lag as computed 
from a given series, and ;r, and or, are the corresponding coefficients 
based on each of the two halves of the same series. 

The correction, intriguing because of its simplicity, loses some of its 
appeal when a series of, say, only 14 annual observations is involved. 
Using the correction would require basing ;r, and or, on series with 
n=7. Most economic series are internally comparable for very few 
years, and serial correlation coefficients estimated from such short series 
will be badly biased.* Hence, where we need it most, the correction is 
of the least apparent use. 

What is needed, since it is practically impossible to get observations 
for more years, is an approach that will give us more annual observa- 
tions for the years for which data are available. Where data are ob- 
served monthly, we have not only one annual series, but actually 12 
such series, one for each month. 

In order for the methods proposed here to be valid, it is necessary that 





* For a discussion of the autoregressive scheme, see Wold [23], Bartlett [2], and Kendall [6, 7, 8]: 
4 Sastry [19] has shown that there is appreciable bias in certain autoregressive series of 200 observa- 
tions or more. 





pFFECTIVE LENGTH OF SHORT TIME-SERIES 911 


each of the 12 series have the same parametric structure. While objec- 
tive tests of this assumption are not known, it does have some logical 
and empirical foundation. We are, in fact, attempting to analyze annual 
serial movement by assuming a particular type of autoregressive process 
and then estimating the generating constants. If this approach is to 
have any meaning, it would seem that the annual movement should 
characterize the whole year, not an isolated month of the year. This is 
not very different from the assumption inherent in using the series of 
annual averages, themselves, as representative of the entire annual 
movement. Moreover, graphic records of actual series prepared by the 
author tend to support the assumption of a single parametric struc- 
ture. Nevertheless, it is highly advisable to examine this assumption 
as it applies to a particular series before applying the methods sug- 
gested here. 

The model (1) requires further the assumption of stationarity—that 
the series have no trend. In practice, however, especially in studying 
short series, it is often regarded as prudent to use original observations 
that are only approximately stationary, since the effect of trend removal 
is uncertain and possibly more detrimental than the existence of a mild 
trend. Some other models which are in some ways preferable to (1) 
(Orcutt’s [11], for example) are, in themselves, not stationary. 

Where the 12 series can be treated as if they have the same paramet- 
ric structure, two possibilities suggest themselves: I. Coefficients, 
rj, might be computed for each lag, s, from each of the 12 annual 
series, and the mean of the 12r,,’3 obtained. The mean would have the 
same bias as the component r,;’s and an appropriate correction for bias 
would have to be applied; II. The sums of squares and lagged products 
for each series might be pooled to produce a single serial correlation 
coefficient, r,. Method II is equivalent to computing, from the single 
long series of monthly observations, serial correlation coefficients with 
a lag 12 times the annual lag. We have, where the mean of the series is 
zeTO, 

12 n—s N—k 


) Z LijLite,§ ps TiLi+k 
i=] 


j=l i=l 


” WEEE Ee] o/ Sebo 


j=l i=l j=l iml+s i=l tol +k 











which approximates r, and r,, where year 1=1, 2,---, m; month 
j=l, 2,---, 12=Jan., Feb.,---, Dec.; N=12n; k=12s. This is 
almost analogous to rearranging the series into one long annual series 





912 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


consisting of first the n January observations, followed by the n Febry. 
ary observations, etc. However, as is implicit in (3), the series js 
“segmented” ; it should be clear that this is necessary, since there would 
be no meaning in pairing the last terms of the series of January obseryg. 
tions with the first terms of the February series, etc. Even the longer 
series may be biased, and a correction is probably necessary. 

In order to take advantage of Quenouille’s correction in short series, 
the following methods are proposed: 


OVERCOMING BIAS, METHOD I 
An estimate of p,, corrected for bias, is 
(4) r, (corrected for bias) = 27.(5,541) — 72; 
where s is the number of years lagged, 7 is the month, and where 


1 ll » 
(5) Ags) = ry Do Pe64.441)) 1,3,---, i, 
j=l 


and 
6 i,,=— Fats } 1,2,°-+,22 
(6) j aM j 


The basic serial correlation coefficients’ are computed from each of 
the individual segments, j, of the series of variates, X,;, 


2 > X ijX ites -_ 2(n - 8) X;? 


t=] 


(7) ‘ji = 





DX + > Xi;? — 2(n — 8)X;? 
fml+a 
The r,,;’s are averaged according to (6). Combining the 12 segments in 
pairs to form six longer segments, the coefficient based on any one of 
the combined segments is 


2 > Xj X ipa js H2 pm X 5X ste g41 —4(n —8) Xj, 541" 


t=] tol 





(8) ret) =— — 


> X.?+ Y Xy+ 2X: J+1 a “x, S+1 2—4(n— s)X;, aw 


f=] t=l+s t=l+e 





§ In this discussion and in the illustration which follows, a modified definition of the serial correla- 
tion coefficient is used. This definition and several others are discussed by the author in another paper 
which is in progress. 





gFFECTIVE LENGTH OF SHORT TIME-SERIES 913 


where j is the segment, n is the number of observations in either seg- 


ment, and the single mean of the two series, Xj, 441, is 


1 n 

— Do (Xs + Xi,5n1)- 

2n ini 
Except for X;,;4:, all the elements of the formula are available directly 
from computations already made in obtaining (7). The r4.j,j41)’8 are 
averaged according to (5). 


OVERCOMING BIAS, METHOD II 


With as many observations as we have in the 12 series, we might 
prefer to compute a single coefficient for each lag from the whole series 
since the bias is a function of the size of the sample. 

Thus we might apply Quenouille’s correction directly to the entire 
“long series”: 


(9) r, = 2r, — $(ir, + fs) 


where r, is the serial correlation coefficient for the sth lag as computed 
from @ given series, and ,r, and »r, are the corresponding coefficients 
based on each of the two halves of the same series. 

As discussed above, the long series should be treated as if it were still 
segmented ; i.e., r, based on the whole series is defined by 


12 n—e 
2a } X jX ine,5 — 24(n — 8) X? 


j=l i=l 





(10) 


n—-8 12 n 


12 

DL XFPt+ LL Xu? — W(n — 2)k? 

j=l i=) j=l tul+s 
where n is the number of observations, i, in each segment, j, and X is 
the mean of all the variates in the series. We are limited in the gain in 
“length” we might expect, since using (10), “end-effects”® will retain 
the same proportion of the variation they had in (7) and (8). Where 
the series is segmented, the expected proportion of “end-effects” is not 
represented by the ratio (n—s)/n which would approach unity if the 
series were really a single set of observations. Instead, the proportion 
is (n—gs)/n (where g is the number of segments) and remains a con- 
stant. Therefore, even the estimate based on the long series (segmented) 





* “End-effects,”” absent in ordinary correlation, are found in serial correlation and arise from the 
fact that in the numerator of the formula for r, for example, the first and last s terms are paired only 
once, while the middle n —2s terms are each used twice. 











914 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


will be considerably biased and the application of Quenouille’s correc. 
tion (9) should lead to an improvement.’ 

The Methods Illustrated. We use Kendall’s [7] experimental series 
No. 1, which had been constructed by Kendall to conform to (1) with 
a, = —1.1 and ag= +0.5. His v; was based on tables of random numbers 
and is distributed rectangularly from —49 to +49. In a series of infinite 
length having the same constants, p: = +.733 and p:= +.306. 

In order that the experimental series represent the 12 annual series or 
the single long segmented series discussed above, it has been broken 
into 12 segments (“months”) of 14 observations (“years”) each. Series 
of roughly this length are likely to be encountered in practice and have 
received considerable attention in the literature. The first segment 
consists of the first 14 terms of the original series; the second, of terms 
15 through 28; etc. In this manner, the first 168 of Kendall’s 480 terms 
are used. The 12 segments have also been combined to form a series 
analogous to annual averages. We have, as terms of the “average” 
segment, 


12 
Xi = DXi. 
j=l 
Actually, the segment consists of totals rather than averages; this saves 
a step in the computation without changing the value of the coeffi- 
cients. The coefficients computed from specified segments are shown in 
Table I. 

The coefficients based on the “average” segment are +.6301 for 
s=1 and +.0493 for s=2. Only one of the 7;’s and two of the r2;’s are 
higher than their respective true value; with no bias, roughly half would 
be higher. The bias is directly indicated by the fact that the means of 
of the 7:;’s and 7r2;’s are each far below the known parameters, as are 
the r,’s based on the average segment. 

The annual averages have the advantage of being independent of 
seasonality and monthly serial correlation. In this example, moreover, 
as should be expected, serial coefficients computed from the average 
segment are higher than the means of those of the 12 series. This is 
partly because of the smoothing effect of averaging and partly because 
the mean of the series of averages is a better estimator of the true mean 
than is that of any of the 12 segments. However, where n is as small 
as in the examples considered here, estimates based on the series of 
annual averages are also excessively biased, and with so short a single 





? Even where the series is not segmented, bias due to the mean, alone, is found in certain series with 
200 observations or more, as noted above. 


EFI 


— 
—— 


i=] 





055 


es 
th 
TS 


EFFECTIVE LENGTH OF SHORT TIME-SERIES 
TABLE I 


915 


SERIAL CORRELATION COEFFICIENTS COMPUTED FROM 
INDIVIDUAL AND GROUPED SEGMENTS OF 
KENDALL’S SERIES 1* 


(pe. = +.733; p2 = +.306) 












































Serial Correlation Coefficient (and formula number) 
Segment, Method I Method II Method I Method II 
j 
Tij Tica 4d) 1M), 71 rT; T2, Ta(y. 941) | 1725 272 T?2 
(7) (8) | (10)¢ (10) (7) (8) (10)¢ (10) 
1 4577 ) ) .0875 ‘ 
6174 -3517 
2 .2539 — .1995 
3 2780 — .0571 
¥ rors )7003 onl -3619 | 1.3499 
4 .7978 -_ 
5 6426 - 
} sus —.0054 
6 5567 y .0232 
panne pa 
7 on ) - J 
6133 — .0277 
~ 6765 1257 
9 5334 — 4290 
box 6359 } oo 0420 
10 6029 — .1035 
1 6302 2375 
.6803 2908 
12 7060 } J 3152 J 
Mean .5533 .6396 | .6681 .6778| .0043  .1516| .1960 .2310 
Variance | .0243 .0014| .0010 — 0563 .0343 | 0237 — 





* The coefficients based on the ‘average’ segment are +.6301 for s =1 and +.0493 for s =2. 
t In this case, (10) is applied separately to each of the two halves of the series rather than to the 


whole series as discussed 1n the text. 


series, the dependability of known corrections for the bias is uncertain. 
In the present example, 7; and r2 computed according to (10), while 
below the true values, are nearer than 7,; OF 74;;, 541). Note the improve- 
ment in the estimates, compared to the true values of .733 and .306, 
as nm increases. 
Applying Method I, we have 


r, (corrected for bias) 


= 2(.6396) — .5533 = + .7259 





916 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


which is very close to the true value, +.733, and 
r. (corrected for bias) = 2(.1516) — .0043 = + .2989 


which is similarly close to its true value, +.306. 
Applying Method II, we have 


r,: (corrected for bias) = 2(.6778) — .6681 = + .6875, and 
r, (corrected for bias) = 2(.2310) — .1960 = + .2660. 


Estimating Autoregressive Parameters. A straightforward method of 
estimating the a’s in (1) from a series of observations requires simply 
the computation of serial correlation coefficients for each of the first 
two lags (r; and re) and substituting for the unknown autocorrelation 
coefficients (p; and p2, respectively) in the recurrence relationship, 
pico + pom + piare = 0 


(11) 
proto + pion + poo, = 0. 


Since po is necessarily unity and ap is taken as 1, we have 


ri(1 — rz) mr? — rT, 
12 a, = — ———-_ and = 
( ) , 1 - r,? = I — r;? 
in which the a’s are estimates of the a’s and the r’s are estimates of 
the p’s. This method, proposed originally by Yule [24], was examined 
empirically along with several alternatives and recommended by Ken- 
dall [8] for series of about 60 observations or more.* 





8 In a paper by Rso and Som [17], one of the methods considered by Kendall [8] was found to be 
superior to (12) for series of the form (1) with n =35, a: = —0.7 and as = +0.6125. This method consists 
of minimising 


= (reps + Gites: + Garg)? 


and involves the computation of r, for several lags (in the example considered, r: through ris were 
used). Since the Rao and Som paper came to the author's attention after the present study was es- 
sentially complete, the method is not examined extensively here. However, autoregression coefficients 
were estimated from the “average” segment (above) according to each of the two methods. Equation 
(12) leads to a’s of —.993 and +.577 while the alternative method produces a’s of —.991 and +.561. 
The results are very close and both methods are badly biased. 

Despite the Rao and Som results, there are strong reasons for preferring (12): 

1. The shorter the series, the less favorable are the results we might expect from minimising 


2 (re4.2 + GiT 941 + G2rs); 


values of r, become more variable as s is increased, especially in the very short series in which we are 
interested. 

2. The methods as compared by Rao and Som were uncorrected for bias. Since both methods are 
biased, it is the efficacy of the correction for bias, not thé superiority of the uncorrected method, that is 
the critical consideration. 

3. The use of (12) requires considerably less computation. 





gFFECTIVE LENGTH OF SHORT TIME-SERIES 917 


The a’s were computed from various r’s using (12) and are shown in 
Table II. The deviations of the a’s from their true values follow 
roughly the same pattern as those of the basic r’s from their true values 
with one outstanding exception. The a’s based on the average segment 
come much nearer to their respective parameters than do the r’s from 
which they were computed. There is no readily apparent reason for this. 

The Effect of Correlation Between Segments. In the experimental series 
segments were independent of one another, but in actual series the 12 
sgments will usually be highly intercorrelated. Fortunately, this 
should not affect estimates based on the suggested methods. 

For simplicity, we shall use Orcutt’s [11] definition of r, (with zero 
mean), 


n Zz LiLit+e 


j=l 


(n — 8) Dee 





(13) 


This definition differs from that used above only with respect to “end- 
effects” and the generality of the exposition should not be affected. 
Let us compare the estimates r, based on the series of annual totals 
(or annual averages) with those based on the whole series (segmented). 
Using (13), 


n—8 12 n—s n—s 12 12—j7 
n>, Lilipe n>, >. Lilian sg tN DD, Dy Visline jap 
i=l jul tml i=l jel pel—j 
px0 
(14) a n n 12 12—j7 


a-)> z? (n—s)>) DD re+(n—s) OD Di tistises 
t=] j=l i=l t=] j=l poel— 
py 





i=1,2,---,n;j=1,2,---, 12; p=1,2,---, (12—s); where 


12 
y= p> Liz. 


j=l 


Where the 12 series are independent, the terms to the right of the plus 
signs will tend toward zero and r, based on the series of annual totals 
(or averages) will approximate that based on the whole series. 

We shall show that, even where the 12 series are intercorrelated, 
p, is estimated by the terms to the left of the plus signs in (14). We 











918 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19 


TABLE II 


SPECIFIED ESTIMATES OF AUTOCORRELATION AND 
AUTOREGRESSION COEFFICIENTS 




















a 
Definition of Number a 
Correlation Formula n of a a; 
Coefficient Series 
s=1 s=2 
Ps ® © _— -733 .306 -1.1 5 
T., average segment (7) 14 1 -6301 .0493 — .993 .577 
Taj (7), (6) 14 12 5533 0043 — .794 435 
TaCie 541) (8), (5) 28 6 .6396 .1516 — .918  .436 
4(irs+27g), method II (10)t 84 2 -6681 . 1960 — .970 -452 
Ts, whole series (seg.) (10) 168 1 -6778 -2310 — .964 422 
r, (corrected, method I) (4) — _ -7259 -2989 —1.076 482 
r,s (corrected, method II) (9) — — -6875 . 2660 — .957 .392 








* The p’s that would be obtained from a series like (1) of infinite length with the a’s shown are given 
by Kendall's [6, 8] formula, nie 
| ¥as|* sin (38 + y) 
a= , 


sin ¥ 





where 


—a@ l+as 
6 = cos 1 —— and tany = 
as 1 —a: 








tan 6. 


t In this case, formula (10) is applied separately to each of the two halves of the series rather than 
to the whole series as discussed in the text. 


have assumed that all segments have the same autocorrelation for each 
annual lag. It is reasonable that,® asymptotically, 


{ Lijt+p = BLiy + €:,54p and 


Lite step = BLize 5; 1H €ite.j+p 


(15) 


where the @’s are inter-segment linear regression coefficients and the e’s 
are uncorrelated with all z’s and all other e’s. Substituting in (14) 
according to these relationships, we have 


n—s 12 n—s m—s 12 12-7 





MD Thine DY Dy TisCinn stn Dy D. Bin Dy LisTines 
(16) _ ps0 :' P 
n 12 2 s 12 12—7 
(n—s) 2/2? (n—s) 2) Di tgtt(n—s) DD Bin Dy ts? 
po 


The 8's are factors which leave the terms to the right of the plus signs 
proportionate to those to the left of the plus signs. Thus, 


* This was suggested by Robert M. Solow of Massachusetts Institute of Technology, one of the 
referees who reviewed this paper. 








oF 


hav' 
corr 
left 
larg 
pro 
pro 


ex 
sho 


im] 
the 
foll 


Th 
vie 


ma 


the 





R 1955 


a 


Bs8el" | 


zg | 888 


ch 


:'s 
4) 





gFFECTIVE LENGTH OF SHORT TIME-SERIES 919 


n—8 12 n—s 








n ym LiLire n po > VijiLite,j 
taal j=l t=l 
(17) n and 12 nm 
(n— 8) x (n— 8) D0 Doi? 
ten] j=l t=1 


have the same probability limit whether or not the segments are inter- 
correlated. However, estimates made from short series according to the 
left side of (17) have the serious biases noted above. Use of the much 
larger number of observations represented by the right side of (17) 
provides us with a vehicle for overcoming these biases, as currently 
proposed. 

Of course, the relationships (15) are asymptotic and it is not certain 
exactly how inter-segment correlation affects estimates based on very 
short series. A sampling study would be desirable. 

Testing Estimates Based on Actual Series. While interdependence 
among the segments should not affect estimates of p,, it does make it 
impossible to test the significance of coefficients computed according to 
the suggested methods from actual series. Pending further study, the 
following approach may be considered a step toward independent 
segments and, eventually, methods of testing results in actual series. 
The approach requires a considerable amount of computation, but in 
view of the current rapid development of high-speed computers, this 
may not always be a serious difficulty. 

If the original segments (months) are transformed by subtraction, 
the differences will be new segments that are often uncorrelated month- 
ly but serially correlated annually.’® Since the existence of monthly 
serial correlation will destroy any tests of significance, it would seem 
necessary to compute the matrix of inter-month correlation coefficients 
so that the entire picture can be examined. 

Limitations of the Present Study. It is clear that much remains to be 
done in the way of empirical investigation of the suggested approach. 

1. The definition of r, developed by the author is only one of many 
probable ones. Other definitions may be better, and, as noted above, 
this possibility is examined by the author in a paper which is in prog- 
ress. 

2. The effect of seasonal variation on the present method has not 
been examined empirically. It is possible that much of the complexity 
due to seasonality could be cleared up by adjusting the original vari- 





10 Depending on the particular series, other expressions of monthly change (relative rather than 
absolute, etc.) might be appropriate. 











920 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1935 


ates for seasonal variation. However, it is not known to what extent 
the adjustments, themselves, affect the serial structure. The matter jg 
one for further study, although the author suspects that the problem 
of seasonality is not too serious. 

3. An efficient method of examining the efficacy of the present ap. 
proach in short actual series has not yet been developed." 

4. The scheme (1) implies that all the error is incorporated into the 
system. In practice this is rarely, if ever, realized, the complications 
taking the form of “superposed variation”” or “errors in variables”" 
that may or may not be serially correlated. Attempts should be made 
to develop a useful method of estimation in short series where super- 
posed variation or errors of observation exist. However, little is known 
about the real nature of superposed variation as it affects actual series 
so the amount to be learned via experimental series is problematical." 
On the other hand, analyzing actual series presents the difficulty of 
unknown parameters, and a thorough study of superposed variation 
depends on the ability to derive the tests suggested in item 3 above. 

5. The scheme (1) is not the only possible one nor necessarily the 
most useful. The choice of a particular model is important, since the 
bias considered above is a function not only of the sample size, as shown 
by Sastry [19], but also of the number of lags and the level of the a’s. 
Orcutt [11] or Cochrane and Orcutt [4] suggest a different scheme 
which might be examined along with other alternatives. A thorough 
examination would consider various a’s and sample sizes for each of 
the model-types. 

6. As noted in footnote 8, the method (12) may not be the best. The 
alternative method discussed has not been investigated in connection 
with the approach suggested in the present paper. Special attention 
should be given to the problem of choosing the optimum number of 
lags to use in the relationship to be minimized. Where two lags are used, 
the methods are identical. 

7. The exact effect of inter-segment correlation on estimates based 
on very short series has not been examined. An empirical study is 
desirable. 





1 For large-sample tests of fit in autoregressive sequences, see Quenouille [15], Walker [22], Bartlett 
and Diananda [3], Rao and Som [17], and Rao [18]. For considerations which may aid in the develop- 
ment of small-sample tests, see Sastry [20]. 

12 See Kendall [6, 9] and Quenouille [15]. 

18 See Stone [21] and Cochrane and Orcutt [4]. 

4 For methods of detecting superposed variation or making estimates eliminating its effect, where 
a large number of lags can be considered, see Quenouille [15], Kendall [6] and Ghurye [5]. 








R 1955 


‘tent 
€F is 
blem 


) aD- 


the 
ions 
4g 8 
ade 


own 
Ties 
a] 
r of 
ion 
ve, 
the 





gFFECTIVE LENGTH OF SHORT TIME-SERIES 921 


CONCLUSION 


Of the limitations summarized above, the most important is that 
outlined in 4—superposed variation or errors of observation not in- 
corporated into the autoregressive scheme. Where a series is long enough 
for the r’s to be computed for many lags, methods of overcoming super- 
posed variation are available.’* However, for series of the length con- 
sidered in the present study, these methods are not applicable and sub- 
stitute methods are not known. 

For the practical analyst to employ the approach suggested in this 
paper in his research, it will be necessary for him to consider first the 
extent to which his series are affected by superposed variation not in- 
corporated into the generating process. This examination will require 
great care since superposed random variation of even 5 or 10 per cent 
can be serious, and, at the current level of our knowledge, this examina- 
tion must be mainly intuitive. Where “superposed variation” is serially 
correlated, it presents less difficulty (provided it extends over the life 
of the series), since it will then be incorporated into the autoregressive 
scheme. Fortunately, most errors of observation, etc., are serially cor- 
related.5 However, a superposed variation that is appreciable will 
obviously alter the basic scheme where its serial structure differs from 
that of the basic series. 

If the researcher realizes the limitations recorded above, he has avail- 
able a crude tool where none was known before. If the problem of super- 
posed variation can be solved, the tool should become quite effective. 


REFERENCES 


[1] Anderson, T. W., “On the theory of testing serial correlation,” Skandinavisk 
Aktuarietidskrift, 1948, 88-116. 

[2] Bartlett, M. 8., “On the theoretical specification and sampling properties of 
autocorrelated time-series,” Supplement to the Journal of the Royal Statistical 
Society, 8 (1946), 27-41. 

[3] Bartlett, M. S., and Diananda, P. H., “Extensions of Quenouille’s test for 
autoregressive schemes,” Journal of the Royal Statistical Society, B-12 
(1950), 108-15. 

[4] Cochrane, D., and Orcutt, G. H., “Application of least squares regression to 
relationships containing autocorrelated error terms,” Journal of the Ameri- 
can Statistical Association, 44 (1949), 32-61. 

[5] Ghurye, S. G., “A method of estimating the parameters of an autoregressive 
time series,” Biometrika, 37 (1950), 173-8. 

[6] Kendall, Maurice G., The Advanced Theory of Statistics, II., London: 
Charles Griffin and Company, 1948, Chapter 30. 


For examples of typical errors of observationand a brief discussion of the nature of their serial 
interdependence, see Cochrane and Orcutt [4]. 








922 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 


[7] Kendall, M. G., “Contributions to the study of oscillatory time-series,” 
Occasional Paper IX, National Institute of Economic and Social Research, 
Cambridge: Cambridge University Press, 1946. 

[8] Kendall, Maurice G., “The estimation of parameters in linear autoregres. 
sive time-series,” Econometrica, 17 (1949), July Meeting Supplement, 44-57, 

[9] Kendall, M. G., “On autoregressive time-series,” Biometrika, 33 (1944), 
105-22. 

[10] Moran, P. A. P., “Some theorems on time-series. II. The significance of the 
serial correlation coefficient,” Biometrika, 35 (1948), 255-60. 

[11] Orcutt, G. H., “A study of the autoregressive nature of the time-series used 
for Tinbergen’s model of the economic system of the United States, 1919- 
1932,” Journal of the Royal Statistical Society, B-10 (1948), 1-53. 

[12] Orcutt, Guy H., and Cochrane, Donald, “A sampling study of the merits of 
autoregressive and reduced form transformations in regression analysis,” 
Journal of the American Statistical Association, 44 (1949), 356-72. 

[13] Orcutt, G. H., and James, S. F., “Testing the significance of correlation 
between time-series,” Biometrika, 35 (1948), 397-413. 

[14] Quenouille, M. H., “Approximate tests of correlation in time-seties,” 
Journal of the Royal Statistical Society, B-11 (1949), 68-84. 

[15] Quenouille, M. H., “A large sample test for the goodness of fit of autore- 
gressive schemes,” Journal of the Royal Statistical Society, 110 (1947), 123-9. 

[16] Quenouille, M. H., “Some results in the testing of serial coefficients,” 
Biometrika, 35 (1948), 261-7. 

[17] Rao, S. Raja, and Som, R. K., “The applicability of large sample tests for 
moving average and autoregressive schemes to series of short length—an 
experimental study, Part 2: autoregressive series,” Sankhyd: The Indian 
Journal of Statistics, 11 (1951), 239-56. 

[18] Rao, C. Radhakrishna, “The applicability of large sample tests for moving 
average and autoregressive schemes to series of short length—an experi- 
mental study, Part 3: the discriminant function approach in the classifica- 
tion of time series,” Sankhyd: The Indian Journal of Statistics, 11 (1951), 
257-72. 

[19] Sastry, A. Sree Rama, “Bias in estimation of serial correlation coefficients,” 
Sankhyd: The Indian Journal of Statistics, 11 (1951), 281-96. 

[20] Sastry, A. Sree Rama, “Some moments of moment-statistics and their use 
in tests of significance in autocorrelated series,” Sankhyad: The Indian Jour- 
nal of Statistics, 11 (1951), 297-308. 

[21] Stone, Richard, “The role of measurement in economics,” Department of 
Applied Economics Monograph No. 3, 1950. Cambridge: Cambridge Uni- 

versity Press. 

[22] Walker, A. M., “Note on a generalization of the large sample goodness of 
fit test for linear autoregressive schemes,” Journal of the Royal Statistical 
Society, B-12 (1950), 102-7. 

[23] Wold, Herman, A Study in the Analysis of Stationary Time-Series, Uppsala: 
Almqvist & Wiksells Boktryckeri, 1938. 

[24] Yule, G. Udny, “On a method of investigating periodicities in disturbed 
series, with special reference to Wolfer’s sunspot numbers,” Philosophical 
Transactions, Series A, 226 (1927), 267-98. 





ON GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 


H. J. Gopwi1n 
University College of Swansea, Wales 


ties of the type of Tchebychef’s, i.e., inequalities for the value of a 
distribution function in terms of known facts about the distribution. 
Such facts may be numerical, e.g., moments or range, or geometrical, 
eg, the property of being unimodal or monotonic in some given 
range. The distributions to which they apply may be classified as single- 
variate, distributions of averages, or multivariate distributions. On 
pages 924-5 there is given a table of the results quoted, showing what 
data each uses and the type of distribution to which it applies. At the 
end of the paper are some remarks on possible future developments and 
on applications. 


Tse PAPER is purely expository and contains an account of inequali- 


INTRODUCTION 


If the frequency function f(z) of a variate z is known, then we can 
calculate P(a, b), the probability that aS2<b, exactly. If we know 
nothing about f(z) then we can make only the trivial statement 
0<P(a, b) <1. Between these two extremes of complete knowledge 
and complete ignorance lie the situations in which we have some infor- 
mation about f(z) and so can make a non-trivial statement about 
P(a, b). The purpose of this paper is to give an account of the results of 
this kind which have been obtained. 


EXPECTATIONS AND MOMENTS 


We shall assume throughout the paper that we are dealing with 
continuous frequency functions, except where discrete ones are specifi- 
cally mentioned. Consequently we write the expectation of the func- 
tion (7) as 


E[#(z)] = f “@(e\f(a)de 


—oO 


whereas in the discrete case this would be 


ie @(x;)f(x;). 


We leave it to the reader to make the necessary modifications in for- 
mulas and statements to cover non-continuous cases. The most serious 


923 





924 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


TABLE OF RESULTS QUOTED 


The first column gives the number of the result below, the second the type of data and the third 
the type of region whose probability is being assigned bounds or the type of result if it is different from 
the general kind. 








Distributions of Single Variates 





Any j absolute moments vj, * * * , 74;- Interval from origin. 

Mr, °° ° 5 fame Semi-infinite interval. 

wi, * * * , un} finite range. Interval with one end at end of range. 

By oa Finite interval. 

My M2, Me. Two finite intervals, symmetrically placed 

about mean. 

“, wa; finite range. Interval symmetrical about mean. 

Expectation of one function. Finite interval. 

Expectation of two functions. Interval from origin. 

Mean range in samples of n. l.u.b. of probability in all intervals of given 

length. 

ve; graph of P(0, z) lies below certain line | Interval from origin. 
in some range. 

Conditions on derivatives in a Sz $b; ex- | Interval OSz Sz. 
pectation of power of z over aSz 3b; 
value of f(z»). 

As (zi) without f(z). Interval from origin. 

As (zit). Interval from origin. 

vr; monotonicity condition on f(z). Interval from origin. 

vz; f(z) has maximum at 0. Interval from origin. 

Discrete distribution; »; monotonicity | Interval symmetrical about origin. 
condition on probabilities. 

ui(0); f(z) has single maximum some- | Interval symmetrical about origin. 
where. 

v1, ¥2; mode at origin. Interval from origin. 

ver; f(z) has single maximum at given | Interval from origin. 
point. 

Mean range in sample of n; f(z) is sym- | Interval symmetrical about 0. 
metrical about 0 and has single maxi- 
mum. 

vn; (x) non-decreasing in interval from 0. | Interval from origin. 

va; /(z) non-increasing in interval from 0. | Interval from origin. 











Distributions of averages or totals 





#, wx; sample estimate of us. Interval symmetrical about yu. 
My Ha. Interval symmetrical about u. 
Restriction on all moments (or finite | Interval symmetrical about uz. 
range). 
As (zzv) but for cumulants. Interval symmetrical about yu. 
pa, Pa. l.u.b. of probability in interval of given 
length. 
HM, (1, Pe. Semi-infinite interval. 
HM, M2, Va. Semi-infinite interval. 
Symmetrical unimodal distribution; finite | Interval from origin. 
range. 
Finite range; Zys. Largest of cumulative totals in given in- 
terval. 














GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 
TABLE OF RESULTS QUOTED—(continued) 





Multivariate distributions (first order moments zero) 





Second order moments. 

Moments up to 2s*® order. 

Second order moments; independent vari- 
ates. 

Second order moments. 

Second order moments; monotonicity con- 
dition on distribution function. 

Observed frequencies in independent 
trials. 


Certain rectangles. 
Certain ellipses. 
Any ellipse with center at origin. 


Certain quadrics or parallelepipeds. 

Ellipsoid with center at origin and axes 
parallel to coordinate axes. 

Bounds on probability of given size of 
measures of discrepancy of observed 


and true frequencies, or two lots of ob- 
served frequencies. 
Region inside contour. 








(crrvitt) | Contour moments. 





difference between the two kinds of distribution is that in the continu- 
ous case the probability that a<z<b is the same as the probability 
that a<x2<b, but in the discrete case, if there is positive probability 
of =a or x=b, the two expressions are different. 

The most extensive results are those which depend on a knowledge 
of some moments of the distribution. Since the notation used for mo- 
ments varies somewhat we state explicitly the one to be used here. 

If ais any value we put E[(z—a)"]=y,(a), and for absolute moments 
E(|z—a| ")=v,(a). We then have yuo(a)=%(a)=1 for all a. We write 
ufor the mean value y;(0) and write ua(u) simply as yup. In the case 
of absolute moments we shall always measure the variate from the 
point about which moments are taken, which means that we take a 
to be 0, and then denote the moment by pa. 

It should be noted that », is the same for the frequency functions 
f(x) and g(x), where g(x) =0 for x<0, and g(x) =f(xz)+f(—2) for 20. 
The probability which is P(—d, d) for f(z) is P(O, d) for g(x) and it 
will be convenient to state results for absolute moments always in 
terms of distributions of non-negative variates. As long as we are 
concerned with probabilities P(—d, d) over ranges symmetrical about 
the origin, there will be no loss of generality. 


RESULTS DEPENDING ON MOMENTS 


Suppose that we are given for f(x) either (7)j7 absolute moments 
Vin * + +, Viz (Where 41<t2< + - + <i;) or (ii) the 2n moments m, ---, 
Men OF (iii) the m moments m, ---, #a when f(x) is zero outside the 
interval bSz<c. Then bounds for P(0, d), P(—~, d), P(b, d) in these 
cases respectively can be found as follows. We construct discrete dis- 
tributions of certain kinds whose moments equal the corresponding 











926 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ones of f(x); each of these distributions has non-zero probabilities gj 
only a finite number of points, which constitute the spectrum of the 
distribution. In the three cases the spectra consist of the following 
points. 

(i) if 7 is even; d and not more than $j other positive points, or 0, d, 
not more than $(j—2) other positive points, and ~. (At © is placed 
zero probability in such a way that »;, is affected but no moment of 
lower order.) 

(i) if j is odd; 0, d and not more than $(j—1) other positive points, 
(ii) d and n other points. 

(iii) if nis even; d and not more than $n other points in the interval 
b<z<c, or b, c, d, and not more than 3(n—2) other points in the inter- 
val b<a<e. 

(iii) if m is odd; b, d and not more than }(n—1) other points in the 
interval b<a<c, or d, c and not more than 4(n—1) other points in 
the interval b<2<c. 

It has been shown that in each case just one construction is possible, 
except that fcr a finite number of values of d, depending on the given 
moments, we have to replace d by d+6é and let 6 tend to 0 to obtain 
the spectrum and associated probabilities. 

We have now constructed in each case discrete distributions, con- 
sisting of a finite number of points and d, with certain probabilities at 
each. The lower bounds for the probabilities P(O, d), P(— ©, d) and 
P(b, d) are the sums of the probabilities in the discrete distributions 
at points to the left of d, and the upper bounds the sum of probabilities 
at points to the left of d and at d. Since these bounds are given arbitrar- 
ily closely by distributions departing by infinitesimal amounts from 
the discrete distributions which have been constructed, the inequalities 
cannot be improved on except by excluding distributiors of this dis- 
crete type; we shall return to this point later. Inequalities which can- 
not be improved on, under the given conditions, are called best possi- 
ble. 

A proof of the above results in case (i) is given by Wald [38] and 
outlines of the proofs in cases (ii) and (iii), with further references, by 
Shohat and Tamarkin [35, pp. 43 and 79]. 

To show the method in operation let us take first the case where we 
know the one absolute moment v,. We have to find a discrete distribu- 
tion having probabilities p at 0 and gq at d and with this moment. 
Hence we must have 


p+q=1, ql" = vp, 





If 


If 


Tl 





, 1955 


8 at 
the 
ving 


), d, 
ced 
t of 


nts, 


val 
ter- 


the 
3 in 


ble, 
ven 
ain 


on- 
3 at 
ind 
ons 
sles 
‘ar- 
om 
sles 
lis- 
an- 
ssi- 


nd 
we 


yu- 
nt. 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 927 


whence 
p=1-»,/d, q=»,/d" 


and 
1 — »,/d* s P(0, d) 31. 


The right-hand inequality is trivial, but for d*>», the left-hand one is 
not. 

If n=1 we have 1—»/d<P(0, d), which is known as Markoff’s 
Lemma. 

If n=2, since p2(0) =», we can write the inequality as 1—j.(0)/d? 
<P(—d, d). If the standard deviation of the distribution is o then, 
by a change of origin, we can write the inequality as 1—t-*< P(u—te, 
uttc). This inequality, discovered by Bienaymé in 1853 and redis- 
covered by Tchebychef in 1867 is often called Tchebychef’s inequality 
and gives the name to the whole subject. 

As a second example let yn, ve, be known. Then we construct a dis- 
tribution with either (A) probability p at d and 1—p at ¢e, or (B) prob- 
abilities p at d, 1—p at 0 and 0 at «. We require (A) pd*+(1—p)e” 
=vn, pa’*+(1—p)e*=mn, or (B) pd*=v,, pd*Svon. Eliminating 
e* in (A) we have (v,—pd")?=(1—p)(vn—pd™"), whence p=(von 
—n?) / (Ven —Vn2?-+(d"*—n)?), E* = (Ynd" —v2n)/(d"—vn). In (B) we have 
p=»,/d*. Since, by Schwartz’s inequality, vzn=v,? we have OS pS in 
(A), but the condition e*=0 requires either d*<v, or d"v, = ren. In (B) 
we have v2,=d"v, and we need v, <d" so that p<1. Thus for a given set 
of values of d, vn, ren just one of (A) or (B) applies. (Except that if 
v=" OF v,d"=ve, we have limiting cases which belong to either (A) 
or (B)). 

If d" <», then in (A) we have e=d so that 


Von — v* 


Von — »,* + (d" - Vn)? ; 





0< P(0,d)sp= 


If vx Sd* S2,/v, then, from (B), 
1—p=1->»,/d* S P(O,d) $1. 
If d* => vn/v, SO thai d*=y, then, in (A), we have eS<d so that 
(d" — v,)? 
Yon — Vn? + (d™ — vn)? 





l-p= < P(0,d) <1. 


These inequalities were first given by Cantelli in 1928. 





928 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


For some solutions in the case (ii) see Zelen [40]. 

A simple extension of the inequality using one moment was made ip 
its most general form by Lurquin [20]; if the events a, « + - , e: occur 
with probabilities p:, ---, p, and if with event e; is associated the 
measurement 2; with absolute moment »,(t) then the probability that 
for at least one ¢ we have | x| <d is greater than 1—( >-:pw,(t)/d*, 

Since the knowledge of a number of moments gives bounds for 
P(— , d) it follows that if a number of corresponding moments of 
two frequency functions fi(z) and f2(x) are equal, then the respective 
probabilities P;(— ©, d) and P,(— ©, d) will not differ widely. Bounds 
for |Pi(—«, d)—P.(— ~, d)| in terms of the moments have been 
given by a number of authors. Khamis [16] quotes some of these, 
simple form being 


| P:i(—, d) — P,(—, d)| 
Ho(@) + + > #a(a) uo(Z) + + + pnts (d) 


Hn(@) +++ uen(a) Hn+i(d) ++ * uen(d) 


where a is any value. Khamis also improves this result in terms of a 
knowledge of bounds for the ratio fi(x) /fe(z). 

We have so far considered P(a, b) only for the cases in which a is 
at the extreme left-hand end of the distribution. Inequalities for general 
intervals can be derived by subtraction from the inequalities which 
we have given, but will now no longer, in general, be best possible. 
Selberg [33] has however given a best possible result in terms of » 
for any interval including the mean value. The result is 
(iv) 

Plu — a, +6) 2 a*/(o? +m) for a(8 — @) = 2m; 

P(u — a, u + B) = (408 — 4y2)/(a + 8)* for 2a8 => 2p, 2 a(8 — a), 
where B2a20. 

For u22 a8 we have only the trivial inequality 


Pu —a,u+ 8) 20. 
Guttman [13] shows that 
(v) 
P(u — ko, wp — kao) + P(u + kee, uw + kyo) = 1 — d-, 


where k,=(1++da)"/2, ky=(1—Ya)"!? or ky = (1-+a(d?+1)/VO8—1))"2, 
ke =(1—aV/(N—1))¥?, or = (1 taV/(N—1))"2, ke=(L—a(d?+1)/ 





— wa <4 © 


oo hCUre., BO 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 929 





Je Qe—1))/? and o?=ype, pya=(a?+1)me. If the expression for kp? is 
negative we replace kz by 0; for the result to be best possible we require 
respectively aAS1, aV/(A?—1)S1 or a<+/(\—1), the distribution 
for which equality i is obtained being a discrete one whose spectrum 
has not more than six points. 
















_ For distributions of finite range the above inequalities can be im- 
for proved on. If z is the maximum value of |2—y| then Lurquin [21] 
8 of gives 

tive H(i) P(u — tua, w+ tr/u2) S 1 — wll — #)/2*. 





This is not best possible but can immediately be improved to 
P(u-t-V uo, wt ye) $1 —a(1 —#?)/(2?— tue) (t<1) which is best 
possible. 







RESULTS DEPENDING ON EXPECTATIONS OF GENERAL FUNCTIONS 






As a generalization of knowledge of moments of a distribution we 
may suppose a knowledge of the expectations of more general functions; 
results are available only when the number of functions is one or two. 
(vii) Let g(x) be a positive function of z, increasing for r2d20. 
Then if m,= fo” { f(z) +f(—2) }g(x)dzx we have 

P(—d, d) = 1 — m,/g(d). 


This seems to have been first stated by Cantelli [9]. 

The following result for the case when the expectations of two func- 
ib] are known was established in a very elegant way by von Mises 
24}. 
(viii) Let g(x), h(x) be two increasing functions of z for positive z, 
such that g(0)=h(0)=0 and the curve I given by the parametric 
equations x=g(t), y=h(t) is concave upwards (i.e., having a tangent 
whose gradient increases with x). Let the frequency function of z be 
zero for negative x and let g= E(g(x)), h= E(h(z)). In the (z, y) plane 
let C be the point (g, h), Q(p, g) the point of I for which ¢ is largest (this 
is possibly a point at infinity for which only the ratio p/g is finite), t 
the parameter of the point at which OC meets I, & the parameter of the 
point at which QC meets I, M the point of [ with parameter d, and 
let CM meet I in the point (g’, h’). Then 


g—g(d) — 
*—g(d) ’ 
(g — p)(h — h(d)) — g — g(@))(h — 9) 
ph(d) — gg(d) 







fa 






is 
ral 
ch 
le. 
He 












); 











ford<t, 0<P(0,d)<1- 










for t, s d s h, 






930 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 193; 
h(d) — 

< P(0,d) <1—2 (2) — hg(d) 
ph(d) — g(a)’ 
a — g(d) 
g’ — g(d) 


The above inequalities are best possible, being satisfied with equality 
for certain distributions with not more than three different values of . 


for 4 S d, = P(0,d) <s 


A RESULT USING MEAN RANGE 


The mean range in samples of n from the population is a function 
which is not covered by any of the above results. Winsten [39] has 
used it to obtain an inequality as follows. 

(ix) Let w, be the mean range in a sample of n; let ¢ be given and m 
determined by 


ie di, ee 
2 < Be “Gad -0-s 


Let p be determined by 


ttm 3 {1 — Gp) — (1 ip}. 


tex] 


lu.b. P(x, x + twp) 2 Pp. 


This inequality is different in character from the preceding ones, 
since it relates not to a single value of P(a, b) but to the least upper 
bound of a certain set of values. 

The inequality is replaced by equality for a certain distribution with 
(m+1) distinct values. 


DISTRIBUTIONS WITH GEOMETRICAL RESTRICTIONS 


All the inequalities which we have so far considered can be replaced 
by equality for distributions with probability at a finite number of 
points. We can improve the inequalities by excluding such distribu- 
tions and this we do by restricting the number of maxima of the fre- 
quency function, or by imposing conditions of monotonicity or con- 
vexity over various parts of the range of the variate. Most of the results 





gENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 931 


ghich follow are also best possible, equality being attained when the 
graph of the frequency function consists of rectangular blocks, possibly 
yith extra positive probabilities at certain points superimposed. 

A fairly general condition is that of von Mises [23]. We replace a 
general distribution by one with non-zero z as explained in the intro- 
ductory remarks about absolute moments and require that for r>2o 
the graph of P(0, x) shall lie below the line joining (z, 1) and (x, 
P(0, 11)). For a given absolute moment », we define ¢, ¢ by 


(r + LSS — 1) = srt — art? 


(r + 1)v.(r — xo) = 7°! — aort!, 


(x) Then 
P(O, 2X) =1- v,/t" 


and if v», >20" and ¢ <7 then we have also 
P(O, 21) = (a —- Xo) /(r ad Xo). 


A particular case of von Mises’ condition is when f(x) is monotoni- 

cally decreasing in some range; more generally than this we may re- 
quire some derivative of f(x) to vary monotonically. Van Dantzig 
[12] proved the following result. 
(xi) Let x be a non-negative variate and denote P(z, ©) by R(z). 
Ina<z<Xb, where 0 Sa<b< —, let (—1)/R“(z) be positive forOSj Sh 
and non-increasing for j7=h. Let h* be the integral part of (b—20)f (xo) 
/P(xo, 6), H=min (h, h*), r=xo+HP(2o, b)/f(xo), 


p*(1 — p)# 





b 
a,’ -f (a* — a*)f(x)dz, Tula) = max : . 
; eS ef eta — ORae 


Then 
P(O, 20) 2 P(O, b) — Ta(a/r)ax’/xo*. 

Except when a=0 this involves P(0, xo) on both sides of the inequal- 
ity, as well as using f(x). A less powerful result, which avoids these 
disadvantages, is 
(xii) 

H 


f x*f(x)dx 


sae P(a, b), 
Xo xo'P(a, b) 





P(O, x) 2 P(O, b) — | Sux 





932 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 
where (a,C) satisfies 





> Hk! 1-—@¢ 7 
pe - —j)r-1 @2-i | —_—. } J... a) = 
FN Ee” eee ( = ) — 


H-1 


Iyn(a) = > HEI) satti(1 — a) 8-1-1 


i=0 


(H + k)!k*H* 
H'k\(H + k)#+* 





vos = Tay4(0) = 


More simply still we have 
(xiii) 
, 


x*f(x)dx ‘ P(a, by 


ro*(1 + A) 1+)r7 





P(O, Zo) P - P(O, b) — Tana 


-_ k*H® (a/2)*t® re Te4i,0(@/20) 
~ (k+ A)® (1 —a/x)*  (a/ao)#+# 


When h=1 the results in (xii) and (xiii) are the same. 
If in (xiii) we put h=1, k=2r, b= — and neglect fo*zx*f(x)dz, we 
obtain 


(xiv) P(O, to) 2 1 — 0 — ve,/((2r + 1)/2r)**(1 + rA)ao” 


where 





a 2r ar Zo 
6 = P(O, a)/(1 + »~), A= (“ er :) /(2r + 1) (= — 1) 


This is an earlier result proved by Camp in 1922. If a=0 and h=1 we 
have 


(xv) PO, 2) 21 -—*(—_), 

BJ x = alii? hagas ’ 
xv 25) 
which was proved by Meidell in 1922. Meidell also [22] showed that 
(xvi) if x takes the values A, A+1,---, B with probabilities 
f(A), -- +, f(B) and if, in the interval | x| <c(1+n-") these probabili- 





GENERALIZATIONS OF TCHEBYCHEF’S INECUALITY 933 


ties have a single maximum at 6, then the sum of the probabilities in 
the interval |z| Sc is not less than 


Vn n “ag 
1-4 y. 
ce*\n +1 


A particular case of (xv) is P(0, d) =1—4»2/9a?, or, for a distribution 
with a variate not necessarily positive, 


P(—d, d) 2 1 — 4u,(0)/9d*; 


the condition on the frequency function is that it should have a maxi- 
mum at r=0. This result was conjectured by Gauss. 

Selberg [32] has shown that 
(xvii) if f(z) has a single maximum not necessarily at z=0, then 


P(—d, d) = 1 — 6p,(0)/d?, 
where 
= .565376 --- satisfies ® — 967+ 30+1 =0. 
(This result is not best possible). He also [34] gives a more compli- 
cated result in terms of other moments. 
Royden [30] gives inequalities for a unimodal distribution in terms 


of the first and second absolute moments about the mode (which we 
take to be at the origin). The results are: 
om (2, — d)? 
(xviii) P(O,d) s 1 —-—————— for 0S dS 2n, 
3ve — 2dy, 
P(0,d) 51 for 2, < d, 
and 
P(0, d) 2 d/2, for 0 s d < VY, 
P(O, d) 2 1 — »,/2d for » Sd S 3n,/4n, 
4y,2 83d 3 
PRORI<-=—+——= te = edee 
Vo Qyq? 4y, a 
A-1 


P(0,d)21-— for — Sd, 
3t? — 4 +A V1 


where A=3r,/4»,? (and is necessarily > 1) and 
&— ¢? 
3t? — 4t +A 








4n, 





934 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


(By a slip, Royden omits a 2 in the first inequality and has 3»,—dy), 
Smith [36, 37] considers the distribution of a non-negative variate 
whose frequency function has a single maximum at c and obtains 


Voy — c7°(1 — 2rc/d(2r + 1)) 


(xix) P(0,d) 21-—- 
(d/0) — c*(1 — 2rc/d(2r + 1)) 





for d <c/P(0, oc), 


where @ is defined by d=2r{ d*+1—(c9)*+1} / (2r+-1)0 {d*—(c6)?}. 
If the frequency function of z has a single maximum at r=0 and is 
symmetrical about +=0 then Winsten [39] shows that 
P(—twnp, twa) 
(xx) 1+ yt — (1-4 


n+l 





21-4} - yt — ya - wt, 


1 
where >" 1—(1—y)*—y* and w, is the mean range in a sample of n, 


Narumi [25] considers the case of the frequency function of a non- 
negative variate which is monotonic in a range starting at the origin 
(previous results of this kind have imposed conditions on the tail of 
the distribution). 

(xxi) If f(x) is non-decreasing in 0S2Sb, where »,<b*<(n+1)»,, 
then 
0 = P(O, d) S d/b for OSd<h, 
d b—d 1)», — b* d 
Oe smmasl we aeees 
b b" — d” b b 
' (n + 1)y, — b” 
(n + 1)d" — &” 
where b; is the positive root of d(b"—d")/(b—d)=(n+1)y,—b"*. For 
b"=(n+1)v, these results have the simple limiting form P(0, d)=d/b 
for 0<d3b, and P(0, d) =1 for b<d. The case b*>(n+1)», is impossi- 
ble. 
(xxii) If f(x) is non-increasing in 0Sz3Sb, where b*>», then 

(n + 1)d 

nb 








= P(0,d)<1 for bd, 


(1-=) 5 P00 51 for 0<d<nb/(n +1), 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 935 


1- =" 5 PO,d) S1 for nb/(m+1) sdb, 


1-*sP,d) <1 for bd. 


DISTRIBUTIONS OF AVERAGES OR TOTALS 


We have so far used information about the distribution of z, but 
nothing about the nature of x. We shall now consider a group of in- 
equalities which deal with the case in which z is a sum of values, each 
with a probability distribution. Many of the above inequalities can now 
be improved upon since for sufficiently large values of n the sum of n 
values cannot take a small number of discrete values. Some of the im- 
petus for work in this field has been the Law of Large Numbers, which 
deals with the limiting form of the distribution of a sum, as the number 
of components becomes large, and several of the inequalities obtained, 
though adequate for their application to this theory, involve unknown 
constants which render them useless from the practical point of view. 
None of the inequalities which follow is best possible, except possibly 
for some special values of the parameters. 

We write Q(a, b) for the probability of the inequality a<z<b, 
where Z is the average of the n values added, and any moments or other 
data used in the definition of a or b are those of the parent populations. 
We assume that the values which are added are independent unless 
otherwise stated. 

Guttman [14] showed that 


(0-0 t/a) 
VES tm Coa) Ee 














jo Zn~ ee 
n 


and the z’s are from the same population. 

Robbins [28] showed that 
(xxiv) if n=O, we=1/n and 1.b. Q(—t/n, t/n) over all distributions 
(the z’s coming from the same distribution) is 1—¢,(é) then ¢,(¢) 





936 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


<t-* (strict inequality) for n>1 and t>-+/n but that #¢,(t)1 as 
t— for all n. He proposed the interesting problem of finding 
lim ¢,(t) 


no 


for all ¢. 

Bernstein [3] imposed the following condition on the moments of 
each of the distributions from which values %, +--+, 2, are taken; 
| ue| <$H™*r! we for all r=2, and some constant H. Then if »=0 for 
each of 1, ---, 2, and B, is the variance of 21+ - - - +2, so that 
B,= us, we have 

—}? 
(xxv) Q( d/n, d/n) 21 200 RL DED 
If the x’s have finite range so that | z;| <M then we may take H= M/3. 

By using the idea of conditional expectation (e.g., expectation of 
ax’ relative to given values of 21, - - - , 2-1) he was able in [4] to remove 
the condition that 1, - - -z, be independent. 

Craig [11] replaced ordinary moments by moments over some finite 
interval (—b, b), where b is arbitrarily large, and showed that a further 
possible condition is 


(xxvi) |d,| < $H*rlra, 


where }, is the rth cumulant (again over a finite interval if desired). 
Offord [26] obtained a lower bound for probability in an interval as 
follows; let the moments of 1, + - - , 2, satisfy 


(xxvii) pa)!2/ygi!8 > 2k1/8, 


Then 





n 


t— t 61 k 
l.u.b. Q ( ~ , + *) s — (toe n+ ——a): 
t n 


k8./n min pe)/? 


the min in the bracket being over the distributions of 2, ---, %. 
He also showed that the n, k, min y,1/? can be replaced by the same 
quantities derived from a subsequence of at least two terms from 
Mh, °° *, Dn 

As n increases, the distribution of #, under certain conditions, tends 
to normality and so it is possible to estimate Q(a, b) in terms of the 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 937 


normal probability integral. If for the distribution of 2; we have 
y=0, w2=H2(t), »e=r(t) and 


B, = > p(t), 


then Berry [5] proved that 


- VB, a" 1.88 s(t) 
(xxviii) lo(-=, : ) - 7 #l2dt| < VE. man ald) 


Berry’s paper is not without errors, as was remarked by Hsu [15, p. 3] 
but a referee tells me that the constant 1.88 is claimed to be still cor- 
rect after the errors have been rectified. Bergstrém [2] proved a similar 
result with right-hand side 





(xxix) (4.8) 2 n(6)/(Bx)*!?, 


which is less good than Berry’s for the case when the z’s have identical 
distributions but is stronger in certain cases when they have not. 
Results of this type can also be proved using absolute moments other 
than the third order one, and for multivariate distributions, but then 
the constants which occur have not been evaluated numerically. 

By comparing a variate with a finite range with a rectangularly dis- 
tributed variate, Birnbaum [7] obtained 
(xxx) if the z’s have the same symmetrical, unimodal distribution 
with range 2a then 


Q(—d, d) el- v,(d/a), 


where 


ic (-pran {2 e+ - Hh 


n §n(t+1)<kSn 


Some inequalities involving cumulative sums are due to Kolmogoroff 
[17, 18]. Here we consider not only the average S,/n of n quantities 
%, +++, 2, but also the sub-totals S,(=2,+ - - - +2,) and find bounds 
for the probability that at least one of these exceeds a given amount, 
or in other words that the largest one does. 

(xxxi) If M is maxisisn |2:—E(x:)|, uis maxigesn |S,—E(S,)| and 





938 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


D*= E(S, — E(S,))? then Pr(u2 R) S$ D*/R? while Pr(u=mR) <D/(R 
—M)™ if m is an integer and R>M. Also Pr(maxis-sn |S,| <R) 
<4(M+R)?/D*. For D2M we have Pr(|S,| 24D) 21/1600, (ie, 
Q(— «©, —D/2n)+Q(D/2n, ©) 21/1600 in the notation used earlier) 
and from this follows trivially 


1 
Q(—©, —R/n) + Q(R/n,'«) = ame 


M? + =) 
600 


D2 


for any R, M, D. The constant 1/1600 could be improved on even with- 
out departing from Kolomgoroff’s proof. 

Other results on means involve unknown constants and will not be 
given here. 


MULTIVARIATE DISTRIBUTIONS 


Some work has been done in the direction of extending the above 
results to n-variate distributions for n>1; the problem is now to find 
bounds for the amount of probability falling in some region of space of 
n dimensions. Owing to the different kinds of region which are possible 
we can use no unifying notation as in the case of one dimension where 
only intervals have to be considered. 

We suppose that the mean value of each variate is zero and write 
Mr-+. ¢= E(x" +--+ ta‘). As a special case for n=2 it is convenient 
to write u20= 017, wor= 02", wi = pode. 

(xxxii) Berge [1] has shown that the probability that (x, 22) falls 
in the rectangle | x, <ka, | xr| <kos is not less than 1 —(1++/(1—p9?)) 
/k? and that this is best possible. 

(xxxiii) Pearson [27] showed that the probability that (x, 22) falls 
in the ellipse :2?/0;? —26221%2/0102-+ 022%27/02?=o?(1—p*?) is not 
less than 1 —I,/Xo"* where 


1 : 1422 aa? 
I,= ff (6 =. — 2612 —— + O22 =) $e, X2)dx,dx2, 
ad P o2? 


o1 0102 


this integral being expressible in terms of moments of the distribution. 
The result is best possible, equality being obtained when the probabil- 
ity is zero except on an ellipse or at its center. This case is excluded if 
we require the distributions of z, and z2 to be independent and Birn- 
baum, Raymond and Zuckerman [6] give the best possible result in 
this case as follows. 

(xxxiv) The probability that (2, x2) lies in 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 


is not less than 1 — P(s, ¢) where, assuming that o°/s?S o2?/t?, 
P(s,t) =1 if o/s? + o:?/?? 21 











Leta ts EY a 


These are obtained with equality for distributions of 1 and 2 with 
only two or three values, and the proof is obtained by showing that the 
g.: eral case can be reduced to one of this kind. 

Chapelon called the ellipsoid peo... ot?+2un0o * * - ote t- +> =1 
a quadrique type, and the parallelepiped which circumscribes this and 
has faces parallel to the axial planes a parallélépipéde type. He showed 
[10] that 
(xxxv) the probability that (1, ---, 2n) lies inside the region ob- 
tained by dilating the quadrique type or parallélépipéde type about its 
center by a scale factor is not less than 1—n/d*. 

Leser [19] considered an ellipsoid with axes parallel to the coordinate 
planes and also imposed a monotonicity condition. 
(xxxvi) Let P be the probability that 





940 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 
n/a? = p 1/o;?, n/do? = > 1/),?. 
Let A(Ro) be the average probability density in the space 
vy 8 
Ao%ao? >, (=) = R,? 
Ass 


and suppose that A(R) isa non-increasing function of R for R SkoovV/n. 
Then if k<1 we have 
P20 for AS] 


P=1-—dz,. for 138 Xo: 


we have 


2 
ae 22 (1 
2 


i 


1 
P21-— 
k2 

1 
P21-— 
Xo? 


we have 


nN n/2 , n/n + 2\1/2 
ra(Sy' mw us 2)” 
n+2 n+2 n 

Dy) Qin J 2 1/n 1/2 
rays» GE) 
n+2 Ao? n+2 n 








GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 


for kS Xo. 


Romanovski [29] gave an inequality concerning deviations of ob- 
served frequencies from true ones. Let there be n mutually independent 
trials each with s mutually exclusive results and let p; be the probability 
of the ith result and q; the observed relative frequency. Then 
(xxxvil) 


Pr{ 20 (a — pi)? < &} >1-(1-—) /ne, 


while to compare two sets of trials (with the same p,) we have that if 
n' and n” trials give observed relative frequencies q,’ and q,’’ then 


Pr{ > (q:' ~— qi’’)? < e} > 1— =( one ~)(5 4 =): 
é s/\n'— nn” 
Camp [8] gave a best possible generalization of Tchebychef’s in- 
equality in terms of new statistics which he called contour moments. 
Let Q,be the set of points at which f(m, - - - , 22) > and let x, be the 
measure of Q,. Let the frequency function be bounded above by A 
so that z4=0 while x» S ©. From the monotonic decreasing function 
\ of z we obtain a function y(x) which is defined for all z by making 
y(z) = min A(z) if X is multiple-valued at z, while y(z) is the value of 
\ at the beginning of an interval in which \ is undefined. Let 7, 
= fo*°xty(x)dx. Then 
(xxxviii) the probability that (m, ---, 2.) belongs to Q, is not less 
than 


1. 2 y" 


x \2r + 1 


A NOTE ON FUTURE DEVELOPMENTS 


Although, as shown above, much work has been done, much still 
remains to do. From a purely mathematical point of view the possi- 
bilities for further work are endless, since the ideal situation is one in 
which there is a best possible inequality for every combination of data 
about the population. From the practical point of view the types of 





942 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


data which need be considered are more limited, but as regards one- 
dimensional distributions it would be useful to have inequalities for any 
intervals in terms of any number of moments. The work of Selberg 
[33] is a step in this direction; a note in [35, p. x], suggests that Mark- 
off solved this problem but the solution does not seem to be extant. 
It would be interesting also to impose a restriction on the number of 
modes in the moment problem (thus continuing Royden’s work in 
[30]), which would exclude the present solution for all sufficiently large 
n, or to find more inequalities depending on geometrical conditions, 
For distributions in more than one dimension there are possibilities of 
generalizing almost all the results belonging to one dimension. 


A NOTE ON APPLICATIONS 


The application of the inequalities listed in the preceding pages is, 
as indicated in the Introduction, to the estimation of the probability 
of observations falling in given ranges, in terms of some known proper- 
ties of the distribution of the observations. Such problems arise in the 
setting of quality control limits, or in the devising of tests of hypothe- 
ses. Suppose, for example, that we want to test the hypothesis that the 
mean value of z is 0, and we know that its variance is J. (Such a knowl- 
edge of variance could be derived from past experience; measurements 
on mass-produced items may have fairly stable variances, but their 
mean values may change owing to tool-wear.) From Tchebychef’s 
inequality the probability that | x| >10, when the mean value of z 
is 0, is less than 1/100, so that if we reject the hypothesis if an observa- 
tion is numerically larger than 10 we have a test with significance level 
1 per cent. If we know that the distribution of x is unimodal then Sel- 
berg’s result (xvii) gives us that the probability of |x| >7.52 . . . is less 
than 1/100. If we know that the distribution of z is normal then the 
probability that |z| >2.57 . . . is less than 1/100, from tables of the 
normal probability integral. The more we know about the distribution 
the more we can contract the bounds on | z|. If we know that the vari- 
ance is not greater than 1 then we can use Tchebychef’s inequality as 
above; a more precise value of the variance (say .81) would lead to the 
narrower bounds |z|>9. Thus in place of exact knowledge of the 
moments used in the statements of theorems we can use inequalities 
satisfied by them. 

As regards geometrical conditions, which are difficult to verify, it 
may be worth remembering that in much statistical work we act on 
the principle that premises which are nearly correct lead to conclusions 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 943 


which are nearly correct; for example, although no variate met with in 
practice can be exactly normally distributed, the normal distribution 
is often a sufficiently good approximation for tables of significance de- 
rived from it to be usefully employed. Similarly an inequality based on 
the assumption that a frequency function has only one maximum may 
reasonably be used when the frequency function is “nearly unimodal.” 
What is an unsatisfactory state of affairs to the pure mathematician 
may be the only one available to the practicing statistician. 


REFERENCES 


The references which follow refer only to the most general result of each type 
and work which has been superseded is not mentioned. A much fuller bibliog- 
raphy of the subject, which has been of the greatest assistance in compiling the 
present paper, was given by Savage [31]. I am also indebted to referees for a few 
references. Papers of which my knowledge is derived only from quotation else- 
where are marked*. 

[1] Berge, P. O. “A note on a form of Tchebycheff’s theorem for two variables,” 

Biometrika, 29 (1937) 405-6. 

(2] Bergstrom, H. “On the central limit theorem in the case of not equally dis- 
tributed random variables,” Skandinavisk Aktuarietidskrift, 32 (1949) 37- 
62. 

*(3] Bernstein, S. “Sur une modification de l’inégalité de Tchebichef,” Ann. Sc. 
Instit. Sav. Ukraine, Sect. Math. I (1924). (Russian, French Summary). 

[4] Bernstein, S. “Sur quelques modifications de l’inégalité de Tchebycheff,” 
Comptes Rendus (Doklady) de l Académie des Sciences de 1! URSS, 17 (1937) 
279-82. 

(5) Berry, A. C. “The accuracy of the Gaussian approximation to the sum of 
independent variates,” Transactions of the American Mathematical Society, 
49 (1941) 122-36. 

[6] Birnbaum, Z. W., Raymond, J. and Zuckerman, H. S. “A generalization of 
Tshebyshev’s inequality to two dimensions,” Annals of Mathematical Sta- 
tistics, 18 (1947) 70-9. 

[7] Birnbaum, Z. W. “On random variables with comparable peakedness,” 
Annals of Mathematical Statistics, 19 (1948) 76-81. 

[8] Camp, B. H. “Generalization to N dimensions of inequalities of the Tche- 
bycheff type,” Annals of Mathematical Statistics, 19 (1948) 568-74. 

*19] Cantelli, F. P. “Intorno ad un teorema fundamentale della teoria del ris- 
chio,” Bolletino dell’ Associazione degl. Attuari Italiani, Milan, (1910) 1-23. 

[10] Chapelon, J. M. “Sur l’inégalité fondamentale du calcul des probabilités,” 
Bulletin de la Société Mathématique de France, Paris, 65 (1937) 100-8. 

[11] Craig, C. C. “On the Tchebycheff inequality of Bernstein,” Annals of 
Mathematical Statistics, 4 (1933) 94-102. 

[12] van Dantzig, D. “Une nouvelle généralisation de l’inégalité de Bienaymé,” 
Annales de Il’ Institut Henri Poincaré, 12 (1951) 31-43. 

[13] Guttman, L. “An inequality for kurtosis,” Annals of Mathematical Statistics 
19 (1948) 277-8. 





944 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


[14] Guttman, L. “A distribution-free confidence interval for the mean,’ 
Annals of Mathematical Statistics, 19 (1948) 410-13. 

[15] Hsu, P. L. “The approximate distributions of the mean and variance of g 
sample of independent variables,” Annals of Mathematical Statistics, 16 
(1945) 1-29. 

[16] Khamis, S. H. “On the reduced moment problem,” Annals of Mathematica] 
Statistics, 25 (1954) 113-22. 

[17] Kolmogoroff, A. N. “Uber die Summen durch den Zufall bestimmter un. 
abhangiger Gréssen,” Mathematische Annalen, 99 (1928) 309-19. 

[18] Kolmogoroff, A. N. “Bemerkungen zu meiner Arbeit ‘Uber die Summen 
Zufalligen Gréssen’,” Mathematische Annalen, 102 (1929) 484-8. 

({19] Leser, C. E. V. “Inequalities for multivariate frequency distributions,’ 
Biometrika, 32 (1942) 284-93. 

[20] Lurquin, C. “Sur une inégalité fondamentale de probabilité,” Comptes 
Rendus, Paris, 187 (1928) 868-70. 

[21] Lurquin, C. “Sur un théoreme de limite pour la probabilité au sens de 
Bienaymé-Tchebycheff,” Bulletin Acad. Bruzelles, (5) 14 (1928) 641-58, 

[22] Meidell, B. “Sur la probabilité des erreurs,” Comptes Rendus, Paris, 176 
(1923) 280-2. 

[23] von Mises, R. “Sur une inégalité pour les moments d’une distribution quasi- 
convexe,” Bulletin des Sciences Mathématiques (2), 62 (1938) 68-71. 

[24] von Mises, R. “The limits of a distribution function if two expected values 
are given,” Annals of Mathematical Statistics, 10 (1939) 99-104. 

[25] Narumi, S. “On further inequalities with possible application to problems 
in the theory of probability,” Biometrika, 15 (1923) 245-53. 

[26] Offord, A. C. “An inequality for sums of independent random variables,” 
Proceedings of the London Mathematical Society (2), 48 (1945) 467-77. 

[27] Pearson, K. “On generalized Tchebycheff theorems in the mathematical 
theory of statistics,” Biometrika, 12 (1919) 284-96. 

[28] Robbins, H. “Some remarks on the inequality of Tchebychef,” Courant 
Anniversary Volume, New York: Interscience Publishers, Inc., (1948) 
345-50. 

*[29] Romanovski, V. I. “On inductive conclusions in statistics,” Compies 
Rendus (Doklady) de l’ Académie des Sciences de 1 URSS (N.S8.), 27 (1940) 
419-21. 

[30] Royden, H. L. “Bounds on a distribution function when its first n moments 
are given,” Annals of Mathematical Statistics, 24 (1953) 361-76. 

[31] Savage, I. R. “Bibliography of Nonparametric Statistics and Related 
Topics,” Journal of the American Statistical Association, 48 (1953) 844-906. 

[82] Selberg, H. L. “Uber eine Ungleichung der Mathematischen Statistik,” 
Skandinavisk Aktuarietidskrift, 23 (1940) 114-20. 

[33] Selberg, H. L. “Zwei Ungleichungen sur Ergainzung des Tchebychefischen 
Lemmas,” Skandinavisk Aktuarietidskrift, 23 (1940) 121-5. 

[34] Selberg, H. L. “On an inequality in mathematical statistics,” Norsk Mate- 
matisk Tidsskrift, 24 (1942) 1-12, (Norwegian). 

[35] Shohat, J. A. and Tamarkin, J. D. “The problem of moments,” American 
Mathematical Society, New York, 1943. 

[36] Smith, C. D. “On generalized Tchebycheff inequalities in Mathematical 
Statistics,” American Journal of Mathematics, 52 (1930) 109-26. 





GENERALIZATIONS OF TCHEBYCHEF’S INEQUALITY 945 


37] Smith, C. D. “On Tchebycheff approximation for decreasing functions,” 
Annals of Mathematical Statistics, 10 (1939) 190-2. 

38] Wald, A. “Limits of a distribution function determined by absolute mo- 
ments and inequalities satisfied by absolute moments,” Transactions of the 
American Mathematical Society, 46 (1939) 280-306. 

39] Winsten, C. B. “Inequalities in terms of mean range,” Biometrika, 33 
(1946) 283-95. 

[40] Zelen, M. “Bounds on a distribution function that are functions of mo- 
ments to order four,” Journal of Research of the National Bureau of Stand- 
ards, 53 (1954) 377-81. 





THE RANDOMIZATION THEORY OF 
EXPERIMENTAL INFERENCE* 


Oscar KEMPTHORNE 
Towa State College 


The paper contains a description of the extent to which the 
use of randomization in experimental designs permits evalua- 
tion of the experimental results. The case considered is that in 
which the whole population of treatments is used with yar. 
ticular experimental material. A completely general mathe- 
matical specification of the design is given and the procedure 
by which linear models for the experimental results are de- 
rived is exemplified by the cases of the completely randomized 
design, randomized blocks, Latin squares, and a particular 
systematic design. The case of the completely randomized 
design is discussed extensively. An assessment of the present 
state of randomization theory is given, with a statement of 
major deficiencies. 


INTRODUCTION 


T HAS become general in experimentation to insist on the physical 
i] act of randomization, which in all cases is randomization over a 
finite universe of possibilities. Frequently the analysis is then presented 


by way of least squares as regards estimation and general linear hypoth- 
esis theory, as regards tests of significance, the construction of inter- 
vals and so on. For example, Yates [22] has given a nominally exact 
test for treatment differences when data are missing from the planned 
randomized experiment. This test is based on the assumption of a 
linear model with errors which are normally and independently dis- 
tributed around zero with constant variance. That this assumption is 
not validated by randomization is clear: for example, in the case of 
randomized blocks there is a negative correlation between the observed 
values for two treatments in any one block. A treatment observation 
in a block may be high because the treatment fell on the highest yield- 
ing plot, and if a treatment so falls then another treatment will fall on a 
lower yielding plot and hence give a lower value. 

Again, it has become general toinsist that a design satisfy the prop- 
erty of unbiasedness [25], whereas if the observations satisfy the con- 
ditions of the simple Markoff theorem, or the general linear hypothesis 
this property is automatically satisfied. Designs otherwise entirely 





* Journal paper number J-2675 of the Iowa Agricultural Experiment Station, Ames, Iowa. Project 
890. This paper is based on one presented to the American Statistical Association at Chicago on Decem- 
ber 29, 1952. 


946 


an 4:4 tO at ee ce a 





ich the 
Valua- 
that in 
h par. 
nathe. 
edure 
re de. 
mized 
icular 
nized 
esent 
nt of 


RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 947 


valid from the point of view of least squares have been removed from 
the catalogue of reasonable designs, because the condition of unbiased- 
ness is not satisfied, for example the semi-Latin square [24]. 

Tests of significance in the randomized experiment have frequently 
been presented by way of normal law theory, whereas their validity 
sems from randomization theory. The use of tests of significance based 
on randomization of the observations was given by Fisher in 1935 in the 
frst edition of The Design of Experiments [21]. Here the problem was 
the comparison of two paired samples of 15 observations. After com- 
menting on the tendency of theoretical statisticians “to stress the nor- 
mality assumption,” Fisher states that “it seems to have escaped rec- 
ognition that the physical act of randomization ..., affords the 
means, in respect of any particular body of data, of examining the 
wider hypothesis in which no normality of distribution is implied.” 
In the particular case with the two samples of 15 observations the 
actual difference between the totals is 314. Fisher then notes that in 
just 1726 of the 2'* possible partitions of the 15 pairs of observations 
into two paired samples of 15, a difference as large as or larger than that 
observed would be obtained. Hence a significance level of 5.267 per 
cent can be attached to the null hypothesis that there is no difference 
between the two populations. For comparison Fisher notes that the 
normal theory test gave a significance level of 4.97 per cent and when 
Yates’s correction for continuity is used, gave a significance level of 
5.158 per cent. In a later paper, Fisher [11], after describing the ran- 
domization method states “. . . conclusions have no justification be- 
yond the fact that they agree with those which could have been arrived 
at by this elementary method.” 

Even though the application of general linear hypothesis theory 
appears in many cases to lead to inferences which are essentially cor- 
rect by Fisher’s criterion above, the validity of such normal theory in- 
ferences in the case of some designs, notably small Latin squares, is 
highly questionable. In the case of the 3X3 Latin square, the only 
randomization test of the null hypothesis is of size 50 per cent whereas 
normal theory allows any size of test. Some other confusions existing 
in the literature are: 

(1) the lack of distinction made between the analysis of randomized 
blocks and the two-way classification with normally distributed 
errors, i.e., data assumed to follow the normal law model 

Yiir = BA Os +O; + Cis, ¢=1,2,°--,7; 
j=1,2,--+,s;k =0,1,2,--+, ni; 





948 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


where p, a; and 5; are constants and the e,;, are normally and in- 
dependently distributed around zero with common variance, 
From another point of view the randomized block experiment is 
not symmetrical with respect to blocks and treatments, whereas 
the above model is symmetrical with respect to the a; and the b;. 
the question of the universe for which the conclusions of the ran- 
domized experiment hold. More fully, is the inference from the 
randomized block experiment valid only for the population of 
experimental units sampled at random, or for some broader 
(always unspecified) population of experimental units. It ap. 
pears that the former is the case. 
the respective natures of the inferences in the randomized block 
experiment without and with the analysis of covariance. The 
analysis of covariance gives apparent increased precision of 
estimates, over what is obtained with randomized blocks, but 
the two analyses, without and with covariance, are not of equal 
validity, the latter utilizing more assumptions. 

(4) the basis for evaluation of the efficiency of one randomized de- 
sign relative to another. 


All the methods under discussion herein were originated primarily 
by Fisher, linear model theory [6], extended as the fitting of constants 
by Yates [23], the use of randomization [8], the use of randomization 
tests [9], the analysis of covariance [7]. 

It is of interest to note in the past few years some explicit recognition 
of the situation, for example by Grundy and Healy [13], Barnard [2], 
and Tukey [20]. The author’s book [14] contains descriptions of both 
normal law analyses and randomization analyses for the basic designs. 
However, a concise connected statement of the whole situation is 
needed with an account in general terms of randomization analyses. 


SCOPE OF THE PAPER 


This paper will deal with only one type of experiment, namely, the 
comparative experiment in which the experimenter wishes to make 
statements about the differences in response or yield produced by a 
fixed set of treatments. Other comparative experiments are of the type 
in which there is a population of experimental units and a population 
of treatments, but only random samples taken in specified ways of 
units and treatments are used. The basic considerations to be given 
apply to broader classes of experiment, but the limitation to the stated 
type is necessary in order to give a reasonably complete account. The 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 949 


absolute experiment as defined by Anscombe [1] is not discussed. The 
aim of the paper is to show how randomization gives a fairly complete 
basis for statistical inference from the comparative experiment, par- 
ticularly if additivity as defined later holds, and to exhibit the gaps in 
our present knowledge. 


CRITERIA OF VALIDITY OF EXPERIMENTAL INFERENCE 


The paper by Anscombe [1] contains a detailed discussion of the 
criteria of validity of inferences, but with special emphasis on tests. 
The crucial aspects of a statistical experimental inference in compar- 
ative experiments are that we wish to estimate treatment differences 
and comparisons and, because a point estimate is of little utility with- 
out a measure of its reliability, we must have a measure of reliability. 
In order to use the measure of reliability with confidence we should 
know the joint distribution of the estimate and its estimated reliability. 
Finally as a guide to the future actions of the experimenter, we need 
accurate tests of significance. 

The important aspect of the validity of an inference is the validity 
of the assumptions on which the inference is based. In simple experi- 
mental situations one can assume a model which specifies that the ob- 
servation is equal to its expectation plus an error which is normally and 
independently distributed with constant variance and then apply 
standard linear hypothesis theory. Such a procedure can be questioned 
because the amount of data necessary to verify the assumptions is 
usually outside the ability of the experimenter to collect. The making 
of assumptions of normality and then applying the battery of mathe- 
matical statistical tests is not a satisfactory basis for experimental in- 
ference, because the extent to which the reliability of an inference de- 
pends on the assumptions made in the analysis is usually unknown. 

The great contribution of Fisher to experimental inference was the 
introduction of randomization [8] together with the later recognition 
that randomization tests could be made.’ It is rather curious that the 
introduction of randomization should permit the making of more ac- 
curate inferences than are possible without its use, more accurate in 
the sense that the probability statements that are made are more ac- 
curate. The inferences also have a definite relation to what is concep- 
tually observable in the situation. 

There have been attempts to base experimental inference on the 





1 The author believes that modern probability sampling also was introduced by Fisher and his co- 
workers at Rothamsted in the late 1920's. In this case randomization is the crucial element distinguish- 
ing modern sampling from that done early in this century. 





950 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


fitting of trend lines, for example, orthogonal polynomials, or autore. 
gressive models as well as treatment parameters. These are essentially 
the same nature as the use of the anlysis of covariance, which suffers 
from the defect that an assumption relating the response to the con- 
comitant variable is necessary. It is not surprising that an increase in 
the assumptions results in apparent increased accuracy of estimates, 
but it may be debated whether this increased apparent accuracy is 
really true accuracy. 


DESCRIPTION OF MATHEMATICAL TREATMENT 


Consider a class C of possible patterns for application of treatments 
to experimental units, such as for example the 12 possible 3X3 Latin 
squares, with particular members denoted by c. An element c is chosen 
from the class C according to some probability distribution. Let us sup- 
pose that the experimental units are denoted by u which runs from 1 to 
N, and the treatments by j which runs from 1 to ¢. A particular experi- 
mental pattern is specified by N pairs of numbers (uj), w denoting the 
experimental unit and 7 the treatment imposed on that unit. Let 

frine++iN 

12-+-N 
be unity if treatment 7, is imposed on the first experimental unit, treat- 
ment jz on the second and so on, and be zero otherwise. Then the 
probability distribution according to which c is chosen from C is given 
by the numbers 


P(ii':...v = 1). 


We are at liberty to choose any set of ¢” nonnegative numbers whose 
sum is unity for the numbers 


P(8:2'...v = 1), 
but should consider the consequences of each particular choice. We also 
need certain marginal properties of the distribution of the 


jiise*+5N 
612 ...N 


We will need, for example, to talk about 5,4, which equals unity if 
treatment j is applied to experimental unit u and zero and other- 
wise, and about P(é,/=1). Or for example we may need numbers 
such as P(s,*=1| 6.=1), which is the probability that treatment k 
is applied to experimental unit v, given that treatment j is applied to 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 951 


unit wu. The actual realization of a particular pattern is accomplished 
by the use of a random device or random numbers. A basic assumption 
is that we have a random device which generates a distribution with 
the stated properties. 

It is possible by this means to represent all standard types of experi- 
mental design in a unified way, that is, systematic, completely ran- 
domized designs, complete randomized blocks and Latin squares, split- 
plot designs and crossover designs, as well as other types which can be 
visualized but have never, as far as I know, been used. 

The general procedure is to examine the behavior of certain func- 
tions of the observations obtained from each pattern. The experimental 
units will be represented by suffices which indicate the amount of 
stratification in the design. For example, in the randomized block ex- 
periment the units will be denoted by (uv), where u gives the block 
number and v the number of the plot within the block. It is frequently 
convenient to use an identity such as the following, which is useful with 
randomized blocks 


Luv = 2. + (tu. — Z..) + (Luv — Tu.) (1) 


where u runs from 1 tor, v runs from 1 tos and 


1 
Lu. = — ) tur, z.. 
8 » 


This identity is useful in the particular case because it will be seen (15) 
that z,., tu. —2Z,, appear as constants while (z.,—2,.) is multiplied by a 
random variable. The general procedure will be exemplified for the 
cases of the completely randomized design, the randomized block de- 
sign and the Latin square. 

The delta quantities used here play the same role as when they are 
used in finite sampling theory. They are the imposed random variables 
in the experimental design and it is therefore reasonable that they 
should be exhibited explicitly. Also it turns out that they enable suc- 
cinct presentation of the distributional problems and of the solution of 
these problems. 


ADDITIVITY 


The term “additivity” has been used rather freely in discussions of 
experimental designs, without precise meaning being attached to it. 
For instance a model 


Yi = w+ a; +b; + (ab) is + eisx 





952 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


is an additive model in the sense that the right hand side is the sum 
of a number of parameters as opposed to product or a mixture of sums 
and products. Also if the symbols #, j are used to denote levels of factors, 
it is usually said that additivity of the factors holds if the terms (ob),, 
are unnecessary. 

A somewhat different concept of additivity is used in randomization 
analysis. We suppose that each experimental unit would have a re. 
sponse or yield, say 2,, for the u** unit under some basic condition, 
Then additivity implies that the yield of the u unit under treatment; 
which we may denote by 4.4; is given by 


Yuj = Ly + fj. (2) 
Consequences of this are, for example, 
Yur — Yus = ty — by, 


which states that if we could apply both treatments 7 and 7’ to each 
experimental unit and observe the difference, then this difference would 
be constant over the experimental units. 

The concept we have given above may be termed additivity in the 
strict sense; if instead of (1) we have 


Yug = Iu t+ bj + ens (3) 


where the ¢,;’s are independent random variables over all u and j, we 
can say that we have additivity in the broad sense. The estimation pro- 
cedures and tests are identical in the two cases, but the distinction is 
worth making because from it we obtain indications of when random- 
ization is desirable and when it is probably unimportant. Additivity in 
the present context will therefore be used to mean absence of interac- 
tion of treatment and ultimate experimental unit. It is different from 
additivity as used in the previously mentioned cases because in those 
cases the separate identity of treatments and experimental units is not 
maintained. In the case when the units are arranged in groups it is 
desirable to weaken this assumption to one of additivity of treatments 
within groups but not between groups. 

Linear models for the case when the additivity assumption is weak- 
ened have been examined by Wilk (unpublished manuscript). 


THE COMPLETELY RANDOMIZED DESIGN 


In this case with s treatment we have rs equals N experimental 
units and each pattern in which every treatment is represented r times 





RANDOMIZATION THEORY OF EXPERIMENTAL INFEKENCE 953 


is given equal probability. If 5,/ is unity when treatment j occurs on 
plot u and zero otherwise, then 

1 s-—1l1 
P(6,/ = 1) ms, P(6./ = 0) = — 
8 8 


r-—1l 


P(iyi = 1| df = 1) = » P(s i” = 1| 8/ = 1) = 0 (4) 
rs — 1 


Tr 
P(5ur?” = 1 | 6,4 = 0) = ’ 
rs—1 


and so on. 
Under additivity the yield on plot u, with treatment k is 


Yur = Tu the 
=2.t+h&+ (%—2,). 
The observed total yield for treatment k is equal to the sum of the yu:’s 
for the plots on which treatment k falls, so that we may write 


T, = r(2, +t) + 3 34ee — 2) (5) 
wax] 


T, le 
= =2.+h+— > sA(z — 2). (6) 


uel 


In equation (6), the observed mean yield is expressed as the sum of the 
true mean yield with treatment k, (z.+¢,), plus an error 


1 n 

— > bt(x. — 2). 

PT us 
This error is a linear function with coefficients (x,—2.)/r of the random 
variables in the experiment, the 5,*. We know the distributional proper- 
ties of these random variables. Hence we may find the distributional 
properties of the treatment means and any function of them, and of the 
individual observed yields and the sum of squares of yields within 
treatments. 

Writing the observations in a form such as (6) has very strong ad- 
vantages in that it exhibits precisely the nature of the error of an ob- 
servation and expresses this error in terms of elementary random vari- 
ables whose distributional properties are known. This simplifies con- 





954 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


siderably the problems of examining distributional properties of the 
error, that is of the quantities 


2D a" 
u=1 


(Zu — x.) 
r 


for the possible values of k. The quantities (r,—z.) are not to be re. 
garded as random variables, but as fixed unknown quantities which 
are attached to the treatment observations according to a known dis. 
tribution. The process of taking expectations is simplified by noting 
that 6,*; is either zero or unity. Hence 


E[(é.*)"] = P(6.* = 1), 
for all non-zero r and 
E[(5u")?(5u-*’) ¢] ai P(by* _ 1, i” = 1), 


for all non-zero p and q. 
It is easily seen that a treatment mean is an unbiased estimate of 
(a,+t.), for 


(tu — 2.) 


e|= — (4. + 1) | E> 5, 


4 u=!1 r 


1 n 
— > (a — 2.) = 0. 
TS uxl 


Also the variance of a treatment mean is 


z|— — (x. + | 


‘ 
<5 { D but (tu — x) 


u 


= By DY (6u*)2(au — 2)? + DS butow*(tu — 2) (tur — x) 
" 


u’ su 


AES —(%—2)?+ > Se — 2.)(tw — x) 


r2 u u’ Au s(rs — 1) 
’ a 


> (a. — 2.)? 


rs(rs — a ‘ 


(- i “)=— o?, where o? = X (a. — x.)*. (8) 


8 r rs — 1 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 955 


Also, if A and B are two contrasts of the observed treatment means, 
T, 
A=>Dxy— >; = 0, 
i r j 


TT; 
B= 7,—) > »; = 0, 
j r j 
then it may be shown that 
o? 
Cov (A, B) = ( Z. nis 
j r 


and thence that 


2 2 
V(A) = (x me, V(B) = (x nt) (10) 
j Tr j r 
The covariance of two treatment means is therefore —(o?/rs), which 
is the usual covariance in a finite sampling situation. Furthermore, we 
may make the following subdivision of the total sum of squares among 
the observations and find their expectations. 


Sum of Squares Expectation of Sum 
of Squares 


1 1 
Between treatments —> 7 -—(T»)? (s—1)o*+r>_,(te—t)? 
T Ts 


Within treatments by difference s(r—1)o? 


1 
Total > (observations)? —— (>, 7x)? 


rs 


If we divide the sums of squares by (s—1) and s(r—1) respectively we 
have mean squares both of whose expectations are o*, if the ¢,’s are 
equal. It is natural to call the quantities (s—1) and s(r—1) the degrees 
of freedom, because the sums of squeres can be expressed as the sum of 
squares of the respective number of random variables, which are un- 
correlated and have the same variance, o. 

The analysis of variance given above could be obtained by the use 


} of the linear model 


Yr =~eMtu ter, w=1,2,---,7;k=1,2,---,8 (11) 
where y,, is the yield of the kth treatment on the uth plot on which it 





956 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


occurs, uw and ¢, are constants and the e,, are uncorrelated errors with 
expectation of zero and the same variance. To establish the property 
of unbiasedness for this design it is (by definition, [25]) necessary to 
show that the expectation over randomizations of the error mean 
square resulting from this model is equal to the mean square among 
all observations in the absence of treatment effects. The property of 
unbiasedness then determines whether the error mean square obtained 
by the use of least squares has an expectation over randomizations 
equal to the true randomization error mean square in the absence of 
treatment effects. It therefore relates to the validity of the analysis of 
the model by simple least squares, this validity being ascertained by 
comparison with randomization analysis. In passing it should perhaps 
be noted that the property has no intrinsic relation to the concept of 
unbiasedness of a test. 

We have shown so far that the use of randomization and additivity 
together give us essentially the same results for estimates as we obtain 
by the use of the normal theory model in which the ¢,,’s of equation 
(11) are also normally distributed. However, we may note the following 
differences: 


(1) With randomization and additivity, as we have defined it, each 
observation is automatically subject to the same variance, 
whereas this must be assumed in the application of (11); 

(2) A treatment mean has a variance of [(s—1)/s] o*/r, but may 
be regarded as having a variance of o?/r for comparison with other 
treatment means in the same experiment. 


We should note that the main results we have given above are essen- 
tially true also when there is additivity in the broad sense, that is, 
there is an error additional to the error >. 6,*(z.—z.). This latter error 


@ 
may be conveniently termed the plot error, and the total error consists 
of the plot error plus the additional error (which may be, for example, 
a@ measurement error). 

Randomization affects only the plot error, and this statement indi- 
cates the fields of application where randomization is important. One 
can envisage experimental situations where plot errors are trivial and 
the additional errors large in comparison to them. In such a situation 
the lack of randomization will not seriously invalidate the experimental 
conclusions. This seems to be the main reason why the randomized 
experiment has not been essential to progress in some of the physical 
and chemical sciences. 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 957 


There remain the questions of the joint distribution of the estimates 
and their estimated variances and of tests of hypotheses about the 
existence of treatment effects. The distributional problem is in prin- 
ciple straightforward, although its exact solution may be unattainable. 
Tests of hypotheses may be tackled by the randomization test proce- 
dure. We choose a function of the observations which we evaluate for 
the experimental pattern used and for each of the possible patterns 
with probabilities specified by the randomization procedure followed. 

This latter will give rise to a discrete distribution of the function 
and our test procedure is to note the proportion of cases weighted by 
their probabilities in which the value of the function exceeds the value 
inthe pattern which was used. If this proportion is equal to p per cent, 
then we can state that the significance level to be attached to the null 
hypothesis that treatments have no effect is equal to p per cent. If we 
use a 5 per cent test, we shall reject the null hypothesis if p is less than 5 
per cent. 

We should note here that the test procedure is described in such a 
way that we may delimit the set of possible configurations we shall use 
in any way we like, and can give whatever probability we like to par- 
ticular plans, as far as the full randomization test procedure is con- 
cerned. If we have 20 experimental units which are in a row and are 
comparing 4 treatments and hence have 5 replicates, we could give zero 
probability to the patterns in which the treatments occur along the 
row in consecutive groups of five units. Such a procedure presents diffi- 
culties with regard to estimation, however, for the analysis of variance 
isintimately related to equal probabilities for all possible arrangements, 
and as far as I know the estimation procedure corresponding to any 
joint distribution of the 6,’’s, other than those for the basic designs, 
has not been worked out though this might be a useful as well as amus- 
ing problem. Off-hand the balanced relations of the basic designs ap- 
pear to be crucial for a simple analysis of experiment. 

We now consider an approximation to the randomization test pro- 
cedure for the completely randomized design, this approximation 
being suggested by the normal law theory. We note that under the null 
hypothesis, the treatment sum of squares plus the error sum of squares 
is equal to a constant, namely, 


Dd (a — 2.2, or (rs — 1)0?, 


which may be zonveniently denoted by S:. The normal theory test 
would be to compare | 





958 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 





— mean | 


error mean square 


with the F distribution. Since the sum of squares for treatments and 
sum of squares for error add to a constant, it is appropriate to consider 
the distribution over randomizations of the treatment sum of squares 
T divided by the total sum of squares Sz, i.e., T’/S2. 

We have already seen that under the null hypothesis, 


(s — 1) 


—* (rs — 1) 


Se. 


The variance of T is a rather complicated expression, but as r gets large, 
it is approximately equal to 2(s—1)/r?s? S,?. The expectation of 7/8, 
is (s—1)/(rs—1), which is the mean of the beta distribution corre- 
sponding to the normal theory F test. The variance of the beta dis- 
tribution with parameters (s—1) and (rs—1) is 


2(r — 1)s(s — 1) 
(rs — 1)*(rs + 1) 


As r gets large this is closely approximated by 2(s—1)/r?s?, which is 
the value obtained for the randomization distribution as r gets large. 
Insofar as the randomization distribution of T can be represented by 
the first two moments, this distribution will be closely represented by 
the corresponding beta distribution if r is large. In a few cases that | 
have examined the approximation works well when r is small, for ex- 
ample, 4. It is, therefore, plausible that the randomization test can be 
approximated by the normal theory analysis of variance test. This 
serves as some theoretical basis for the fact which has been noticed by 
most statisticians, that the level of significance of the analysis of van- 
ance test for differences between treatments is little affected by the 
choice of a scale of measurement for analysis. 





COMPLETE RANDOMIZED BLOCKS 


This case will be reviewed rather briefly because detailed treatment 
is given in [14], Section 8.2. Here, only the case when additivity in the 
strict sense holds will be considered. As before the lessening of assump- 
tions to additivity in the broad sense results in little change. 

Denote the conceptual yield with treatment k(=1, 2, ---, 8) on 
plot v(=1, 2, - --, 8) of block u(=1, 2, +--+, 17) by yuox. Under addi- 
tivity 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 959 
Yuk = Luv + te. (12) 
Now 
Yuok = Y... + (Yu.. — YY...) H Yuk — Yuo.) + (Yus. — Yu..) (18) 
or, using (12) 
Yur = 2.. +t, + (tu. —2..) tte —t. + (fur — tu). (14) 


We observe the yield from treatment k on a random plot v of each block 
uy. The observed yield we write as z,, and we can write 


Zur = tba tte + D Sust(tus — Zu.) (15) 


.,» and ¢, is written for (4 — ¢.), 


Zuk = + bu + te + ext, (16) 


which is the usual form for the model, with an obvious correspondence 
of symbols. The distributional properties of 5,,.* for the complete ran- 
domized block design are easily written out. Hence, theoretically at 
least, the distributional properties of any function of the observations 
can be obtained. The results on the estimates of treatment differences 
and their errors are with one exception the results on estimation ob- 
tained with normal law theory. The exception, which occurred also in 
the previous case of completely randomized designs, is that treatment 
totals or means are negatively correlated in the usual way for samples 
from finite populations. This correlation is such that we may for the 
purposes of comparing treatments within the experiment attribute a 
variance of o?/r to each treatment mean. However, the variance of a 
treatment mean for comparison with outside data is [(s—1)/s]o*/r 
and not o?/r which is the formula generally used. 

It was shown by Welch [21] and Pitman [17] that the randomiza- 
tion test may be approximated by the normal theory F test with (s—1) 
and (r—1) (s—1) degrees of freedom. Hence under some circumstances 
the randomization distribution is probably represented fairly accurately 
by the corresponding normal theory distribution, so that the standard 
F test may be used. Some empirical support of this statement was ob- 
tained earlier by Eden and Yates [5]. 

An interesting facet of the randomization situation is the distribution 





960 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19; 


of the error sum of squares. It appears that the variance of the erro; 
sum of squares is 


2(r — 1)(s — 1) ! 
r 





over the randomization distribution compared to 2(r—1)(s—1)¢* over 
the normal theory distribution. 


THE EFFICIENCY OF RANDOMIZED BLOCKS 


The accuracy of Yates’s [24] method of estimating the efficiency of 
randomized blocks relative to the completely randomized design has 
been questioned [4]. This is not surprising since Yates presents the 
analysis in terms of a normal law model, while the efficiency is based 
on a randomization argument. Yates’s procedure is as follows, given 
the analysis of variance in the table below for the observed yields. 


af. Mean Square 
Blocks (r—1) 
Treatments (¢—1) 
Error (r—1)(s—1) 


Consider the error mean square in the absence of treatment effects 
if blocks had not been used. This is obtained by substituting EZ for T 
and getting the overall mean square, which is 


(fF — 1B +r(s — IE. 
rs — 1 


This divided by EF is the estimated efficiency of the blocks. 

This is derived easily by the use of the previous notation. Number 
the plots by u and », the block number and the number of the plot 
within the block respectively. The difference between the two designs 
is with respect to the distribution of the quantities 5,". In the case of 
the completely randomized design 





1 
P(iy.* = 1) = > 


as before, but 


(r — 1) 
1 


P(Suy* = 1| bus” 1) , v’ w Vv, 


compared to zero for randomized blocks. This is the only difference in 
the distributions which is needed. 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 
The yield y..z satisfied 
Yuk = Luo + te 
= 2,.+ te + (Sue — 7..). 


So the observed yields z,, in the completely randomized design satisfy 
or Zuk=H+tet > uvdue*(tue—z..). As we have seen the error mean 
square has an expectation of 


1 
(rs — 1) u (tue — 2..)? = core’, say. 


However, 


p » (Tus — 27 = s>, (Ze. = z.)* + z. (tue = Ze.)* 


(rs = 1)ocrs’ = (r —_ 1)B + r(s - 1)ors’. 
The estimation procedure follows directly from this equation since the 
expectation of the error mean square equals ozs’. 

MODIFICATIONS OF RANDOMIZED BLOCKS 


Incomplete randomized blocks are from the present point of view 
rather minor modifications of complete randomized blocks. Split-plot 
designs have a different joint distribution of the 6’s. Suppose the nota- 
tion is as follows: 


block number u=1,2,--- 
whole-plot number y=1,2,--- 
split-plot number w=1,2,-- 

whole-plot treatment j = 1,2,:-- 
split-plot treatment m = 1, 2,---: 


Then, for example, 


1 
P Suen™ =l1)=— 
( ) rs 


1 
P(Suew™ = 1| Sucw™ = 1) = 1 : w’ # w,m’ #~m 
$= 
and so on. The analysis is developed from this point of view in ({14], 
Section 19-1). 





962 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195 


THE LATIN SQUARE WITH ADDITIVITY 


Let yu.s be the yield for treatment k(=1, 2, - - -, ¢) on the plot in 
row u(=1, 2,---, ¢) and column »(=1, 2,---, é. Then we may 
write for the observed total 7, for treatinent k 


T™,= t(u + tito t+ te) + } Suv*Cuv (17) 


where 
Cur = (tes ~~ Ty, ~ Sa + a3. (18) 
If we take 


1 
P(éy* = 1) = "1 


P(Swe® = 1| du* = 1) = 0, wu’ ¥ 4, 
P(Suv* = 1| due" = 1) = 0, v’ ¥0, 


1 
P(bue* = 1| duet = 1) = —— u’' # u, vp’ #v. 


and denote [1/(t—1)?] > w.eu.? by 0”, we find, for > =0, Sov, =0, that 
TT, 
z(= Ak =) = Dt, 
T 2 
Cov (xa ri n=) = (>> rgv) . 


Ti 
The expectation of the error mean square is a”, so that the elementary 
normal law procedures are obtained. The conditions given in (19) are 
mathematical expressions of conditions given by Fisher in his funda- 
mental paper on the subject [8]. 
The normal approximation to the randomization test is not as easy 
to examine as in the previous cases. The quantity U 


y treatment sum of squares 





error sum of squares plus treatment sum of squares 


is easily seen to have an expectation of 1/(¢—1), by the argument given 
given by Fisher [10]. Welch [21] found that the variance of U, de- 
pended on the population of Latin Squares from which the random one 
was chosen, in fact, on the transformation set, and on functions of the 
éy» 8. Welch found indications that the significance level indicated by 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 963 


the F table could be in error appreciably. With some constructed data, 
he estimated that the 5 per cent tabular F value would be exceeded in a 
proportion of 2.7 to 6.4 per cent of randomizations. It is interesting to 
speculate whether there exists a large family of Latin Squares different 
from the families specified by Fisher [8] and by Fisher and Yates [12], 
which has better properties with respect to the approximation of the F 
distribution to the randomization distribution of the analysis of vari- 
ance criterion. 
THE KNUT VIK SQUARE 


It is not my aim here to consider in general the Knut Vik Square 
which is the classic case of a systematic design, but it seems desirable 
toinclude at least one example of a systematic design to show the gen- 
erality of the approach. The Knut Vik Square is as follows: 


ABCDE 
DEABC 
BCDEA 
EABC D 
C DEAB 


If rows are characterized as levels 0, 1, 2, 3 and 4 of a factor z, and 
columns as levels 0, 1, 2, 3, 4 of a factor y, the 24 degrees of freedom be- 
tween the 25 cells can be represented by 6 sets of 4 degrees of freedom, 
which are usually denoted by X, Y, XY, XY?, XY* and XY‘. The par- 
titions corresponding to XY and XY‘ are diagonal partitions, XY? 
corresponds to the above square and X Y* to another Knut Vik square. 
It has been suggested [16] that the experimenter should choose X Y? 
or XY* to give the treatment pattern and then use the other for the 
estimation of error. If we number the rows by u(=0, 1, 2, 3, 4), the 
columns by »(=0, 1, 2, 3, 4) and treatments by 7(=0, 1, 2, 3, 4), then 
under random assignment of treatments to letters, a random choice of 
the degrees of freedom XY? or XY? to be associated with treatments, 
the observed treatment total becomes 
Ty = t(u + Pu + Vv + ty) + bs Suv? (Luv — Lu. — La + z..) 

where the p’s, y’s and ¢’s are row, column and treatment contributions, 
and 5u»* equals unity if treatment 7 appears in plot u and is zero other- 
wise. In this case with the Latin Square 


P (Sue! = 1) = $ 
but P(8y-.'=1|8,,*=1) is not a constant for all (u’v’) unequal to (uv). 





964 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 19g 


If u’+2v’ =u+2v (mod 5) this probability is one-half, for instance anq 
if u’+2v’ ~u+2v (mod 5) or u’+3v’~u+3v (mod 5), the probability 
is zero. Complete specification of these probabilities would be tedious 
and is not necessary for our purposes. From the strict point of view of 
randomization tests, it is clear that one can make only a test of the 
hypothesis of equality of treatment effects with a size of 50 per cent, 
In other words, one will use a test which will reject the hypothesis when 
it is true 50 per cent of the times. Surely in this case randomization 
cannot be appealed to as a justification for a normal law analysis. 


ANALYSIS OF DESIGNS WITH NON-ADDITIVITY 


It appears from the preceding description that by and large, experi- 
mental inferences may be based entirely on randomization providing 
additivity holds. The main concern of the experimenter should then be 
towards the determination and use of a scale for which treatments are 
additive in their effects. It has seemed to me that there has been in the 
past far too much emphasis on homogeneity of errors. It is remarkable 
that there is only one paper in the literature which tackles the problem 
of testing for additivity, namely Tukey’s test [19]. This test is based 
on a rear model with normally distributed errors. Actually tests for 
homogeneity of variance tend to be used somewhat as tests for additiv- 
ity, in that the error law which is guessed at is the basis for a transfor- 
mation and an additive linear model is used on the transformed vari- 
able. 

In view of the fact that perfect additivity on any one simple scale is 
unlikely to hold, it behooves us to consider the problem of estimation 
when there is non-additivity. The problem of the sensitivity of the 
randomization test, which will be discussed later for the case of ad- 
ditivity, is extremely difficult when there is non-additivity, and I know 
of no work at all on the matter. 

For the case of randomized blecks it is found that block treatment 
interactions must be zero in order that the design be unbiased in Yates’s 
sense. 

It is perhaps worthwhile interpolating a consequence of this fact 
with respect to the design of randomized block experiments. It does not 
appear to be at all desirable to section the experimental material into 

‘ordinary randomized blocks, of, let us say, highly different fertilities 
(or basal yields) because this procedure is likely to lead to block treat- 
ment interactions. In fact the requirement that the whole of the exper- 
imental material be as homogeneous as possible, which is the require- 
ment in physical or chemical experiments, also holds for other experi- 


i ee! ee a. 6. ee ee 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 965 


ments. The impression is sometimes given that experimenters need only 
randomize and their troubles are ended. In fact only some of their 
troubles are ended. 

It is easily proved that any comparison of observed treatment means 
ig an unbiased estimate of the same contrast of the “true” treatment 
means. Here the “true” treatment mean is the mean which would have 
been observed if all the experimental material were subjected to the 
treatment. This seems a reasonable definition to use. Hence the infer- 
ences described here refer to the population of experimental units used 
in the experiment and not to some vaguely defined broader population. 

Some work on the estimation of variances of treatment comparisons, 
based on a suggestion of Neyman [15], is given in [14], Section 8.3. If 
the non-additivity is slight in some sense, my own opinion is that the 
usual formulas are reasonable and unlikely to lead one astray. The 
situation with respect to non-additivity for the completely random- 
ized design is essentially the same as for randomized blocks. 

In the case of the Latin Square, however, there appears to be no 
work on non-additivity. The validity of the Latin Square for estimation 
is based entirely on the assumption of additivity, since one is picking 
out terms like (%u»—2u.—2.0+z..) at random to be attached to treat- 
ment totals. This is the fact which results in the difficulty of estimating 
both direct and residual effects in crossover designs. 

The randomization test for the Latin Square or for any randomized 
design is entirely valid in the sense of controlling Type I errors, but the 
approximation to this test by the F-distribution when there is non- 
additivity is apparently completely unknown. 


THE POWER OF THE RANDOMIZATION TEST 


It seems intuitively reasonable that the power of the randomization 
analysis of variance test when additivity holds, will be equal asympto- 
tically (as r and ¢ get large) to the power of the normal theory analysis 
of variance test. It seemed worthwhile ({14], Section 12.6) therefore to 
conduct a small empirical investigation of the power of the randomi- 
zation test and to compare this power with the power of the analysis of 
variance test which is obtainable from Tang’s tables [18]. There was 
an indication that the sensitivity with intermediate treatment differ- 
ences is not well approximated by Tang’s tables. 

More work needs to be done on this matter. 


CONCLUDING REMARKS 
I have attempted to give an ordered and, in terms of present know- 





966 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ledge, an integrated account of the randomization theory of experi- 
mental inference. The gaps in this theory are in my opinion: 


(1) the accuracy of the approximation to randomization tests by F 
tests 

(2) the rather stringent role of additivity (which is also present in 
the case of normal law inferences) 

(3) the power of the randomization analysis of variance test 

(4) the consideration of alternative test criteria. 


My own view on item (4) is that in view of the optimum properties 
of the analysis of variance test for the normal law model, the answer 
obtained in an investigation of this item will tend to support the use 
of the analysis of variance criterion. 

There is, I suppose, nolimit to the number of test criteria which could 
be used and in recent years there has been some emphasis on non- 
parametric criteria. It should be realized that the analysis of variance 
test with the F distribution has a fair basis apart from normal law 
theory and is probably in most cases a good approximation to the ran- 
domization analysis of variance test, which is a non-parametric test. 
When one considers the whole problem of experimental inference, that 
is of tests of significance, estimation of treatment differences and esti- 
mation of the errors of estimated differences, there seems little point in 
the present state of knowledge in using method of inference other than 
randomization analysis. 


REFERENCES 


{1] Anscombe, F. J., “The validity of comparative experiments,” Journal of 
the Royal Statistical Society, A-61 (1948), 181-211. 

[2] Barnard, G. A., “Discussion to paper by K. D. Tocher,” Journal of the Royal 
Statistical Society, B-14 (1952), 91-2. 

[3] Cornfield, Jerome, “On samples from finite populations,” Journal of the 
American Statistical Association, 39 (1944), 236-9. 

[4] Daniels, H. G., “The estimation of components of variance,” Supplement 
to the Journal of the Royal Statistical Society, 6 (1939), 186-97. 

[5] Eden, T., and Yates, F., “On the validity of Fisher’s z test when applied to 
an actual example of non-normal data,” Journal of Agricultural Science, 
23 (1933) 8-17. 

[6] Fisher, R. A., “The goodness of fit of regression formulae, and the distribu- 
tion of regression coefficients,” Journal of the Royal Statistical Society, 85 
(1922), 597-612. 

[7] Fisher, R. A., Statistical Methods For Research Workers. First Edition. Edin- 
burgh: Oliver and Boyd. 1925. 

(8) Fisher, R. A., “The arrangement of field experiments,” Journal of Ministry 
of Agriculture, 33 (1926), 503-13. 





RANDOMIZATION THEORY OF EXPERIMENTAL INFERENCE 967 


(9] Fisher, R. A., The Design Of Experiments, Edinburgh: Oliver and Boyd, 
1935 


(10) Fisher, R. A., “Discussion of paper by Neyman, J. et al.,” Supplement to 
the Journal of the Royal Statistical Society, 2 (1935), 107-54. 

(11) Fisher, R. A., “The coefficient of racial likeness and the future of craniom- 
etry,” Journal of Royal Anthropological Institute, 66 (1936), 57-63. 

(12) Fisher, R. A., and Yates, F., Statistical Tables, Edinburgh: Oliver and Boyd, 
1938. 

13] Grundy, P. M. and Healy, M. J. R., “Restricted randomization and quasi- 
Latin Squares,” Journal of the Royal Statistical Society, B-12 (1950), 286-91. 

(14) Kempthorne, Oscar, The Design And Analysis Of Experiments. New York: 
John Wiley and Sons, 1952. 

(15) Neyman, J., with the cooperation of K. Iwaskiewicz and St. Kolodoziejczyk, 
“Statistical problems in agricultural experimentation,” Supplement to the 
Journal of the Royal Statistical Society, 2 (1935), 107-54. 

[16] Nissen, Mivind, “The use of systematic 5X5 Latin squares,” Biometrics, 7 
(1951), 167-70. 

[17] Pitman, E. J. G., “Significance tests which can be applied to sampies from 
any populations. III. The analysis of variance test,” Biometrika, 29 (1937), 
322-35. 

[18] Tang, P. C., “The power function of the analysis of variance tests with 
tables and illustrations of their use,” Statistical Research Memoirs, 2 (1938), 
126-49. 

{19} Tukey, John W., “One degree of freedom for non-additivity,” Biometrics, 5 
(1949), 232-42. 

[20] Tukey, John W., “Dyadic anova, an analysis of variance for vectors,” 
Human Biology, 21 (1949), 65-110. 

[21] Welch, B. L., “On the z test in randomized blocks and Latin Squares,” 
Biometrika, 29 (1937), 21-52. 

[22] Yates, F., “The analysis of replicated experiments when the field results are 
incomplete,” Empire Journal of Experimental Agriculture, 1 (1933), 129-42. 

[23] Yates, F., “The principles of orthogonality and confounding in replicated 
experiments,” Journal of Agricultural Science, 23 (1933), 108-45. 

[24] Yates, F., “Complex experiments,” Supplement to the Journal of the Royal 
Statistical Society, 2 (1935), 181-247. 

[25] Yates, F., “Incomplete Latin squares,” Journal of Agricultural Science, 26 
(1936), 301-15. 





STATISTICAL ABSTRACTS 


All communications concerning this section should be addressed to 
the Abstracts Editor, Professor W. L. Smith, Department of Sta- 
tistics, University of North Carolina, Chapel Hill, North Carolina, 


Bross, I., “A confidence interval for a 
percentage increase,” Biometrics, 10 (1954), 
245-50. 


Let there be two binomial populations 
with parameters p; and pz. Then the rela- 
tive excess (or increase) of pz over p: May 
be defined as @=100(p2—2:/m). Given 
samples of size mn; and ng respectively, with 
number of occurrences 2; and 22, the prob- 
lem is to find a confidence interval for 9. 
The solution is obtained under the condi- 
tion that p; and 7: are sufficiently small, and 
m, and nz are sufficiently large to take 2 and 
a2 as having Poisson distributions. Then the 
joint distribution of 2, x2 may be written as 
the product of the distribution (Poisson) of 
m+22 with the conditional distribution 
(binomial) of zx, given 2,;+22. The binomial 
parameter in the latter distribution is 
P=mp./mm+mp; which is a monotonic 
function of ©. The result quickly follows 
that l1—a=P{L<@<U} where LZ and U 
are defined by L=100(m— (ni+ne)(U.L.)) 
/(m(U.L.)) and U=(m—(m+m) (L.L.)) 
/(n(L.L.)) where U.L. and L.L. are upper 
and lower end points of a 1—a confidence 
interval for P as defined above. LincoLn 
Mosss, Stanford University. 


Bross, I., “Misclassification in 22 tables,” 
Biometrics, 10 (1954) 478-86. 


In sorting the elements of a sample into 
two classes there are two possible kinds of 
error. Let the probabilities of these two 
kinds of error be called © and ¢; they are 
“parameters of the classification scheme.” If 
both are zero, then every individual is cer- 
tain to’be classified correctly. If not both are 
zero then in general the estimate of the bi- 
nomial parameter p corresponding to the 
classification will be biased, and so will its 
estimated variance. When the same classifi- 
cation scheme is applied to samples from 
two populations and the equality of », and 
P2 is being tested the test remains valid (i.e., 
has the nomial level of significance) but its 
power is reduced. The estimate of the dif- 
ference ~1:—22 will be biased. Lincotn 
Moszs, Stanford Unigersity. 


Chernoff, Herman, “Rational selection of 


decision functions,” Econometrica, 29 
(1954), 422-43. 


The author states as his fundamental 
purpose in this paper “.. . to see whether 
the theory of decision functions shows 
promise of being applicable to ‘real’ prob. 
lems and not necessarily to specify how the 
theory is to be applied.” 

Following a few comments on “min max 
risk,” “min max regret,” and the “subjective 
approach” as alternative criteria for select- 
ing a decision function, the author presents 
a set of ten postulates which are regarded as 
descriptive of a rational approach to select- 
ing a strategy. The author points out that in 
his conception of the problem “...a ra- 
tional criterion can be precisely described 
only in terms of the postulates. On the other 
hand, an intuitive notion leading to the 
selection of these postulates is the following. 
In a given problem, the statistician should 
first eliminate those strategies which are ob- 
viously bad. He should then dispose of some 
of the remaining which, while not so obvi- 
ously bad, still fail to: make the grade. After 
acertain amount of eliminating, the remain- 
ing strategies will be considered adequate. 
The statistician will have no reason to prefer 
any of these strategies to the others. The 
set of these strategies will be called the solu- 
tion of the problem. It is not implied that 
the statistician necessarily considers that 
any two elements of the solution are equiva- 
lent.” In the development of the set of 
postulates controversial! postulates are indi- 
cated and those playing a critical role are 
discussed in some detail. 

Postulates 1 through 8 are applied in the 
proofs of three theorems in a section labeled 
“main results.” (This reviewer cannot claim 
to have checked the proofs.) Theorem 1 
states that if a rational solution for the 
simplified formulation exists, randomized 
strategies are unnecessary. Through theorem 
2:it is shown that regret matrices are rele- 
vant, i.e., the statistician may base his 
choice of strategy on the regret matrix. 
Theorem 3 (regarded as the main result) 
shows that for the class of all mixed prob- 
lems involving n states of nature and a 
finite number of pure strategies, the unique 


968 





Se ln ee a i Oe ee ek aoe oe a oO Oo oO om 2 eee CP one ee. a a G2 ee 


eT 5B O20 5 &o 


~ 


oer T 


STATISTICAL ABSTRACTS 


criterion satisfying the postulates formu- 
lated is equivalent to assuming that each 
state of nature has an a priori probability 
of 1/n. Following the proofs of the theorems 
is a section given over to an interpretation 
of the results and a consideration of their 
implications with respect to the possibility 
of a rational approach to real problems via 
the decision function formulation. 
Additional “miscellaneous” results are 
presented in a brief concluding section. 
Ivan M. Lex, University of California. 


Cochran, W. G., “Some methods for 
strengthening the common x? Tests,” 
Biometrics, 10 (1954) 417-51. 


The author cites two difficulties which 
commonly beset the use of the x? test. First, 
the test is often used where fairly clear 
alternatives to the null hypothesis are con- 
templated, but it is not tailored to take 
care of them; second, when the null hy- 
pothesis is rejected the question often re- 
mains, in what way does it fail to be true? 
The bulk of the paper is devoted to mitigat- 
ing both of these difficulties. For testing 
goodness of fit to Poisson or binomial dis- 
tribution the variance test (2(2;—#)2/z in 
the case of the Poisson) is recommended as 
ordinarily more sensitive than the general 
goodness of fit test; and when the latter is 
used the “no expectation less than 5” rule is 
probably unnecessarily strict, and wasteful 
of power. Analogously a test of skewness 
and/or kurtosis is ordinarily a more sensi- 
tive test of normality than the general 
goodness of fit test. For each of these distri- 
butions methods are offered for testing that 
the observed frequencies are: subject to 
linear regression (e.g., in time) or that there 
is & division into the first k“high” frequen- 
cies and the remaining n—k “low” fre- 
quencies; and for each distribution there is 
given a method for constructing a single 
degree of freedom corresponding to any 
linear combination of the frequencies. Con- 
tingency tables can be broken down into sub- 
tables for analysis. Methods and principles 
involved are discussed. In particular whether 
one should use an additive or nonadditive 
decomposition depends upon whether the 
table as a whole exhibits significance. Where 
one of the classifications has a natural order 
(such as degree of improvement) special pro- 
cedures such as the use of “scores,” are 
available, and natural. Methods of combin- 
ing several 2X2 contingency tables are 
taken up. Where the data are affected with 
variation in addition to binomial or Poisson 
variation, then x? analysis may be inappro- 
priate; suitable transformation and applica- 


969 


tion of the F test is likely to be preferable. 
Lincotn Moszs, Stanford University. 


Collier, R. O., Jr., “The least-squares 
analysis of a pXq Xr factorial design with 
unequal subclass frequencies,” Journal of 
Experimental Education, 22 (1954), 297-83. 


Using fixed main and interaction effects, 
Collier writes an interaction-zero model and 
a model “which attributes to each ijkth sub- 
class a fixed effect,” providing a “‘pooled 
interaction’ sum of squares.” He then out- 
lines the analysis of variance procedure and 
its application to a typing-reading experi- 
ment. Junian C. Stanuey, University of 
Wisconsin. 


Durand, D., “Bank stocks and analysis of 
covariance,” Econometrica, 23 (1955), 30-45, 


In this paper the author presents results of 
one phase of a broader study analyzing 
bank stock prices by regression methods. 
The particular questions to which the author 
gives attention in the present paper bear on 
the validity of certain specifications of 
classical regression in the analysis of certain 
types ‘of pooled cross-sectional and time 
series data. The type of data employed per- 
mitted suggestive tests of the specifications 
questioned. 

The regression equation to be fitted in an 
analysis of the generation of bank stock 
prices is the following: Log P=k’+5 log B 
+d log D+e log E; where, P= bank stock 
price, B=book value per share, D=divi- 
dends per share, and E = earnings per share. 
The main phase of the analysis is based on 
annual data for 117 banks covering the 
period 1945-1952. One question considered 
is whether the total of 936 observations 
might be pooled for analysis to reduce the 
standard errors of the estimated coeffi- 
cients. Pooling implies, of course, the as- 
sumption of uniformity of regression rela- 
tions for different banks and for different 
years. The validity of this assumption is 
tested, employing methods of covariance 
analysis. For this purpose the 117 banks 
are divided into six groups, primarily on a 
geographical basis. A regression equation of 
the above form fitted to each year within 
each bank group provides 48 sets of regres- 
sion coefficients in a 6X8 layout. This per- 
mits tests of the uniformity of regression re- 
lations between groups of banks within 
years and between years within bank 
groups. Also, on the basis of this design, 
residual variances were tested for homo- 
geneity over years and over bank groups, 
using Bartlett’s test statistic. 





970 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


The author concludes that the above tests 
suggest heterogeneity in regression slopes 
and variances among groups of banks and, 
perhaps less conclusively, among years 
within groups. This in turn suggests that, 
in pooling to increase the number of ob- 
servations, the stocks included within a 
group would have to be carefully selected 
for homogeneity and the time period cov- 
ered would have to be fairly stable and rela- 
tively short. This led the author to explore 
the use of quarterly data as a possibility for 
increasing the effective number of observa- 
tions yet restricting the time dimension to a 
relatively short period. For this phase of the 
analysis, quarterly data for 15 of the 17 
banks in one geographical group (New 
York) were employed. The data covered 
eight consecutive quarters beginning with 
the last quarter of 1951, thus giving a two- 
way 8X15 “experimental layout.” The 
variance analysis model employed assumed 
uniform coefficients b, d, and e and as- 
sumed the constant term k’ composed of 
three independent components: k,;’=k* 
+aio+ao;; where k* is unvarying, aio(t 
=1,2, - - -,p)istheinfluenceoftheith bank, 
aoj(j=1, 2,- ++, Qg) is the influence of the 
jth quarter, and 2;ajo = 2 ja, = 0. The result- 
ing test suggests significant individual bank 
effects. Finally, the hypothesis of zero auto- 
correlation of residuals in different quarters 
is subjected to test. Two testing procedures 
are employed. One test was based on von 
Neumann ratios calculated for each bank 
from quarterly residuals derived from the 
deviations from regression in the cells of the 
8X15 experimental layout referred to 
above. A second testing procedure (which 
is only mentioned) involved replicating 
price data within each quarter and testing 
for bank-quarter interaction. These tests 
lead to the rejection of the hypothesis of 
zero auto-correlation of quarterly residuals. 

In a concluding section the implications 
of the test results are discussed briefly. Ivan 
M. Lez, University of California. 


Edwards, Daisy S., and Parkin, S. J., 
“Empirical investigation of the problem of 
disproportionate frequencies in analysis of 
covariance as applied to a methods experi- 
ment,” Journal of Experimental Education, 
22 (1954), 275-64. 


For proportionate numbers of subjects in 
18 method-school groups, F’s were slightly 
smaller than when moderately dispro- 
portionate subclass frequencies, judged by 
the authors to be typical of educational 
experimentation, were used. Addition of a 
seventh school in which frequencies were 


quite disproportionate increased the 
“school” and “method z school” F's fur. 
ther. “Even with extremely disparate num. 
bers... we obtain values of F for which 
the probabilities are very similar” to those 
for proportionate subgroups. Junin (, 
Stan.eEy, University of Wisconsin. 


Epstein, Benjamin aud Milton Sobel, 
“Sequential life tests in the exponential 
case.” Annals of Mathematical Statistics, 
26 (1955), 82-93. 


The authors describe sequential life test 
procedures when the underlying distriby- 
tion is given by the exponential distribution, 
They assume n random items are available 
for life testing from the given distribu. 
tion and wish to test Ho:@=Qp against 
H,:0=®©, with Type I error=a and Type 
II error=8. The decision to continue the 
experiment is made as long as the in- 
equality B<(@0/@,)’ exp —(@r'-O7) 
V()<A holds. The constants, B and 4A, 
depend on a@ and #. V(t) is a function of t 
which represents the total life observed 
up to time ¢. At the time the experiment is 
stopped, Ho or H; is accepted according to 
violation of the first or second inequality. 
The authors, then, give two parametric 
equations which determine approximately 
the O.C. curve and tabulate the approxi- 
mate values of the expected number of ob- 
servations required to reach a decision 
when @ is the true parameter [viz., Z(r)] 
for sequential tests of various values of 
k=0)/01 and a, B. 

The authors also calculated exactly L(@) 
and E¢(r) in special cases and obtained an 
upper and lower bound for them. They gave 
eight different problems, with solutions, to 
which these methods have been applied. 
A. E. Sarwan, University of North Carolina. 


Federer, W. T. and Schlottfeldt, C. S., “The 
use of co-variance to control gradients in 
experiments,” Biometrics, 10 (1954), 282-90. 


Using an illustrative example as a vehicle, 
an analysis of variance with quadratic co- 
variance is given. The method is presented 
as a possible alternative to tho latin square 
design, in some connections. LINCOLN 
Mosss, Stanford University. 


Graybill, Franklin, “Variance heterogeneity 
in a randomized block design,” Biometrics, 
10 (1954), 516-20. 


In a randomized block design for compar- 
ing treatments two assumptions used in the 
analysis of variance may fail to hold in cer- 
tain instances; there may be different vari- 





m 
—s 


Bawo rzanaoes & 


—~e fw eon we dud Feo wo 


ee ee ee ae ae 


STATISTICAL ABSTRACTS 


ances associated with the different treat- 
ments, and the errors within any blocimay 
be correlated. The customary F test is then 
inapplicable, but Hotellings T? may be 
validly applied. A worked example is given. 
Livcotn Moses, Stanford University. 


Grundy, P. M., Rees, D. H., and Healy, 
M. J. R., “Decision between two alterna- 
tives—how many experiments?”, Bio- 
metrics, 10 (1954), 317-23. 


The following problem is posed: a new 
process will result in a gain of (y—c)k’ if 
put into practice. In the above expression 
k’ is a known constant determined by the 
proposed scale of application of the new 
process; (n—c) is the difference between in- 
crease in yield per unit and cost per unit. 
The experimenter pays k units per experi- 
ment; he performs an initial experiment, 
obtaining an observation y, (assumed 
normal, with known variance o?). The au- 
thors, using the theory of the fiducial infer- 
ence, show how to select n, the number of 
further experiments to be performed, in 
order optimally to balance the additional 
cost of experimentation against the ex- 
pected gain due to taking the correct deci- 
sion (i.e., adopting or rejecting the new 
process). Lincotn Moses, Stanford Uni- 
versity. 


Hamburger, William, “The relation of 
consumption to wealth and the wage rate,” 
Econometrica, 23 (1955), 1-17. 


The author is concerned with the relation 
C=C(R), where C is per capita current con- 
sumption and R is per capita “lifetime 
consuming power anticipations.” It is 
pointed out that the most commonly used 
index of R is current disposable income. In 
the author’s formulation, a single measure 
such as disposable income is considered too 
comprehensive, i.e., it carries the implicit 
assumption that the relation of different 
types of income to anticipated lifetime 
consuming power are stable and identical. 
In the formulation of the present paper, 
“human wealth” and “property wealth” 
are explicitly recognized as separate bases of 
lifetime consuming power anticipations. It 
is noted that measures of wealth rather 
than income are regarded conceptually as 
the more appropriate measures for re- 
flecting consumer power anticipations. 

A measure proportional to annual wage 
rate (kL), where L is annual wage rate, is 
used as an index of human wealth. Property 
wealth is measured as the sum of two com- 
ponents (G+jE) where G consists of net 
government debt to private sector, net 


971 


foreign debt to private sector, and stocks of 
consumer durables, and £, an index of all 
other private wealth, is the sum of net 
interest, dividends, proprietors’ income, and 
rental income. Specific sources and various 
adjustments introduced in constructing 
these measures are set out in an appendix. 

A linear function was fitted by least- 
squares to annual data for the period 
1929-41 and 1947-50. Written without 
explicit recognition of different components 
of consuming power anticipations, the linear 
relation is C=aR+b. Substituting for R 
the separate components indicated above 
this relation becomes C = aG+-ajE+akL-+b. 
Before fitting, this function was deflated by 
wage rate (L) giving C/L=aG/L+ajE/L 
+ak+b/L. The fitted equation provides 
estimates of the separate coefficients a, j, k, 
and b. Estimates of a, j, and k were also 
obtained from a relation omitting b/L from 
the above relation (i.e., on the assumption 
b=0). 

Results are summarized and discussed 
briefly, directing attention to the suggestive 
character of the results from the substantive 
point of view. Ivan M. Les, University of 
California. 


Hayman, B. I., “The analysis of variance of 
diallel tables,” Biometrics, 10 (1954), 
235-44. 

A powerful method of investigating gen- 
etical properties of severa], say n, inbred 
lines is to construct all n Xn single crosses 
(each line appearing once as male and once 
as female) and selfs; the resulting array of 
n? observations is called a diallel table. 
There is developed an analysis of variance 
model which permits testing for the exist- 
ence of genetic variability among lines, 
dominance effects and effects associated 
with a line entering as male rather than 
female parent. Methods of conducting large 
diallel crosses in latin squares and related 
designs are discussed. A clearly worked out 
8X8 diallel cross is presented as an arith- 
metical example. Lincotn Mosss, Stanford 
University. 


Kimball, A. W., “Short cut formulas for the 
exact partition of x? in contingency tables,” 
Biometrics, 10 (1954) 452-8. 


Convenient computing methods appro- 
priate to decomposition of contingency 
tables are presented and illustrated. An 
rXs table can be broken down into (r—1) 
(s—1) individual tables, each of size 2X2. 
Lincourn Mosss, Stanford University. 


Li, C. C. and Sachs, Louis, “The derivation 





972 


of joint distribution and correlation be- 
tween relatives by the use of stochastic 
matrices,” Biometrics, 10 (1954), 347-60. 


Two classical problems in genetics are to 
determine the probabilities for various geno- 
typic combinations between relatives (of 
specified relationship) and to find genotypic 
correlations between relatives. The present 
paper, using matrices of conditional transi- 
tion probabilities, gives a straightforward 
method of solving both problems,iwhich are 
ordinarily quite cumbersome to deal with. 
Lincotn Moszs, Stanford University. 


Mandel, J., “Chain block designs with two- 
way elimination of heterogeneity,” Bio- 
metrics, 10 (1954), 251-72. 


A new class of designs, generalized frum 
the recent chain block design, is presented. 
The assumptions are the ones ordinary to 
incomplete block designs. It is necessary 
only that the number of treatments be a 
multiple of the number of blocks . The de- 
signs are quite flexible and easily analyzed. 
Both the theory and computations are 
clearly set forth in the paper. LINCOLN 
Mosss, Stanford University. 


Moore, P. G., “A note on truncated poisson 
distributions,” Biometrics, 10 (1954), 402-6. 


An earlier paper considered a rapid less- 
than-efficient estimate of the parameter X 
in which frequencies greater than or equal to 
8 were not observed. The estimate there was 
x= LSr) rn,/ZS=3 n, which is suggested by 
the identity [$= re) /r! =vALTS=20"e) 
/r! Three other types of truncation are con- 
sidered in this paper and estimates of 
analogous type are offered. There is given 
some indication of the loss of efficiency as- 
sociated with those (very easily calculated) 
estimates. The three additional types of 
truncation are: 

a) Frequencies less than or equal to k not 

observed. 

b) Only frequencies between s and k 

(8<k) observed. 
c) All frequencies except those between 
s and k observed. 
Lincotn Moszs, Stanford University. 


Nelder, J. A., “A note on missing plot 
values,” Biometrics, 10 (1954), 400-1. 


Where only one observation is missing in 
a balanced design, a convenient method of 
analysis calls for the construction of fic- 
titious “observation” after which the com- 
plete-design analysis yields an unbiased 
error sum of squares. This invented value is 
only a computational device and “is not 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


intended as estimate of the missing datum,” 
The author points out that it, none the less, 
ts an estimate of the missing datum, ap 
unbiased estimate if the analysis of variangg 
model is satisfied. Hence, if the invented 
value is absurd it stands as a ws. »ing that 
the model may not hold for the scale of 
measurement used, and that a transforms. 
tion may profitably be sought for. Lincoiy 
Mosss, Stanford University 


Rao, C. R., “Estimation of relative potency 
from multiple response data,” Biometrics, 
10 (1954), 208-20. 


It may be in a biological assay that two 
(or more) graded responses are observed for 
each animal. The relative potency can be 
estimated from either series of responses, 
If they give essentially equal estimates then 
an optimum linear combination of the re. 
sponse variables may be sought in terms of 
which the estimation will be most efficient, 
This problem is treated for two response 
variables from a non-theoretical—but in- 
stead, fully computational—standpoint. 
This is done in such a way as to enabic im- 
mediate generalization of computational 
methods to k-response (k>2) case. Lincotn 
Moszs, Stanford University. 


Sarkar, D. and Laha, R. G., “A modifica- 
tion of the variate-difference method,” 
Econometrica, 23 (1955), 67-72. 


In the classical variate difference method, 
a time series is assumed to be the sum of 
two components—a polynomial trend and 
a random element distributed independ- 
ently with zero mean and finite variance. 
The modification in this paper is proposed 
to deal with time series in which prominent 
cyclical variations are present. It is as- 
sumed that the time series observations are 
a sum of a polynomial trend, a cycle of 
known period, and an irregularity, giving 
(1) ye=aotait--ael?+ +++ +apl?+A sin 
((2xt)/d) (+a) i +e; {=1, 2° *%%, N) 
where p is the unknown degree of the poly- 
nomial trend, A is the known period of the 
cycle, and ¢; is a random component inde- 
pendently distributed with mean zero and 
variance o*%. An unbiased estimate of o? is 
derived. An estimate of o? is also derived 
for the more general case in which the 
cyclical component is assumed to be the 
resultant of several harmonics of different 
but known periods Ai, As, ° °°, Az It is 
noted that the generalized method can be 
applied if the \; are not known if it is known 
or can be assumed that \;=)/i, (¢=1, 2, 
+++, KSA). Results of an application of 





Ul ar w 82 S&S te FT Oe 


STATISTICAL ABSTRACTS 


formulation (1) above for \=7 and A=9 are 
reported and compared with results of the 
classical method. The results suggest that 
the estimates of variance tend to stabilize 
at an earlier stage in the modified method. 
They also suggest that variance estimates 
are not highly sensitive to small differences 
ind. Ivan M. Lue, University of California, 


deVergottini, Mario, “Sugli indici di de- 
pressione,” Studi Economici, n° 9/6, No- 
vember, 1954, 445-50. 


To discuss the intensity of underde- 
velopment, one may use different indices. 
The usual index J; = (a:— M)/M, witha; per 
capita income of the depressed area, and 
M national income per capita, does not fit 
well to the problem. It is linked directly to 
the intensity of underdevelopment, but 
inversely to the number of people suffering 
it. In one country where t he depressed area 
is very vast and M near the bottom, this 
index stays low and does not express the 
gravity of the problem. It also does not 
allow comparison between two countries, 
because of its construction. Some simple 
indexes expressing the deepness of the de- 
pression a2—a, in reference to M, a, or a2 are 
also used. 

The index J;=(a:—a2)p:/M, with az in- 
come per capita of the non-depressed area 
and »; the percentage of the total popula- 
tion in the depressed area, would be better. 
It is directly linked to the intensity of the 
depression, but also to the number of people 
in the depressed area. J; corresponds to the 
percentage of increase of the total national 
income that would allow to depressed areas 
per capita income in equal to per capita in- 


973 


come in high-level areas. International 
comparisons are therefore not possible, J; 
depending upon the actual situation in 
each country. 

Some other indexes can be constructed to 
take into account the fact that depressed 
areas are not monolithic. The index 
L4=(a2—4a1) m72/M is of this type, giving 
the dispersion rapported to the mean. 

Conclusion is thet no judgment on de- 
pression can be made without careful ex- 
amination of the indexes used for demon- 
stration, the nature of index having a big 
impact on the results obtained. J. Davin, 
Conseil del’ Europe. 


Worcester, J. “How many organisms?”, 
Biometrics, 10 (1954) 221-8. 


In problems of biological assay involving 
quantal response, it sometimes happens 
that the dose is an aliquot from a suspension 
of organisms. The dose is the actual number 
of organisms applied, and is discrete, and 
subject to error, which may or may not be 
negligible as compared with the dilution- 
interval. Two general sorts of models in use 
for the problem are considered, and a third 
is introduced and illustrated. The standard 
logit and probit methods regard the dose as 
not subject to error; the “most probable 
number” method is based on the assump- 
tion that any animal actually receiving one 
or more organisms must then exhibit the 
response. The model introduced assumes 
that the probability of response to an actual 
dose of z organisms is of the form 2/z+b; 
the “most probable number” method cor- 
responds to the use b=0. Lincotn Moses, 
Stanford University. 





BOOK REVIEWS 


Statistical Problems of the Kinsey Report. William G. Cochran, Frederick 
Mosteller, John W. Tukey. Washington, D. C.: The American Statistical Associa. 
tion. 1954. Pp. x, 338. $5.00; $3.00 to members of the American Statistica] 
Association. See review article on pages 811-29. 

Colonial Social Accounting. Phyllis Deane. Cambridge, England: The Nationa] 
Institute of Economic and Social Research, Economic and Social Studies XI, 
1953. Pp. xv, 360. $10.00. See review article by William O. Jones, on pages 665- 
76. 


The Design and Analysis of Experiments. Oscar Kempthorne. New York: John 
Wiley and Sons, Inc., 1952. Pp. xix, 631. $8.50. 


Georce E. P. Box, Imperial Chemical Industries, Ltd. 


HIs is an extremely lucid and full account of the design of agricultural 
experiments and related statistical theory. 

The book contains 29 chapters. The early chapters provide a discussion 
of scientific method and the principles of experimental design exemplified by 
Sir Ronald Fisher’s famous tea-tasting experiment. There follows a brief 
but useful summary of elementary statistical theory, the normal distribu- 
tion, and derived sampling distributions, and a discussion of such important 
concepts as orthogonality, estimation, and likelihood ratio tests. A discus- 
sion of least squares theory and the general linear hypothesis appears next. 
Here the general results and proofs are set out in the compact and readily 
appreciated forms possible with matrix notation. A discussion of analysis of 
variance classifications then follows using the normal theory developed in 
previous chapters. 

At this point randomization theory is introduced. Following Fisher, 
Pitman, and Welch, the view is taken that the justification for the use of 
normal theory is that, for tests to compare means, normal theory supplies 
a good approximation to the result which would have been obtained had the 
full randomization test been performed. This view is sustained throughout 
most of the rest of the book which contains a full account of the standard 
agricultural designs: randomized blocks, latin squares, factorial experiments, 
split plots, fractional replication, lattice designs, etc. 

The author refers to the randomization test model as the finite model, 
the normal theory model as the infinite model. This is perhaps not a happy 
distinction for many, including the reviewer, would prefer to regard the 
randomization test as having an infinite reference set. 

For the reader who wishes to apply the techniques in fields other than 
agriculture, some caution is necessary. It has sometimes been rashly assumed 
that the strategy of experimentation specifically developed for agricultural 
application is directly transferable into all other fields. This is not so. 

The basic characteristics of agricultural experimentation are: (1) A long 
time (about one year) must elapse between the planning of an experiment 
and the obtaining of the results, which then all become available together. 


974 


_ -_ i- ane ht ie, ae 6S 





300K REVIEWS 975 


(2) The experimental error is large. Since in these circumstances another 
year must elapse before questions still in doubt can be resolved, large ‘com- 
prehensive’ designs are employed. 

By contrast, the basic characteristics of much industrial experimentation 
are: (1) Experiments are usually performed one after the other and com- 
paratively rapidly. (2) The experimental error is often somewhat smaller 
than in agriculture. In these circumstances sequential experimentation is 
the natural procedure. The results of a comparatively small group of trials 
are used to plan the next set and each subsequent set is planned in the light 
of all the knowledge available up to that time. In this way it is ensured that 
no more observations are made than is necessary to obtain the desired pre- 
cision, and furthermore that trials are performed at levels of the factors 
which are of real interest to the experimenter. To ignore the possibility of a 
sequential strategy when such is possible may result in a great loss of effi- 
ciency and confirms the experimenter’s worst fears concerning the good sense 
of the statistician. 

This point is illustrated by the example on page 426 of the book. This de- 
scribes an experiment to determine the effect of 7 factors thought to in- 
fluence the readings given by a machine designed to test the consistency of 
tinned foodstuffs. A 1/9th replicate of a 37 factorial design was used, and this 
involved 243 observations. After the experiment had been performed it was 
apparent that nothing like this number of observations was necessary either 
to attain the precision desired or to estimate complex effects. A sequential 
strategy beginning, for example, with a high order fractional replicate of a 
2 level factorial design would have provided a more efficient and natural 
approach and would have achieved the desired result with a small fraction 
of the effort. This point is emphasized because it is believed that outside the 
field of agriculture the sequential situation is by far the most common one. 

These reservations should not obscure the fact that here is an excellent 
text book of value to student, teacher and practical worker alike. 


An Outline of Biometry. C. J. Bliss and D. W. Calhoun. New Haven, Connecti- 
cut: Yale Co-Operative Corporation, 1954. Pp. x, 272, xvi, plus 18 tables and 5 
figures. $4.50 plus 30¢ postage. Paper. 


H. Farrrietp Sita, North Carolina State College 


N A Galton Laboratory tea in 1937, when there were few text books to 
guide a student in study of statistical methods for research, Fisher re- 
marked that the way to obtain a good one would be for everyone who might 
feel the urge to try his hand and see which product would survive. Even 
then the prospect of the volume of literature such a procedure might produce 
somewhat dismayed me. The flood is now upon us. Fair comparison is a task 
for a specialist in nothing but statistical texts. Some years ago I gave up in 
despair trying to index, let alone be acquainted with, current production; 
therefore I must disclaim ability to give accurate rating to any one. However, 





976 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


An Outline of Biometry should take a high place in the race for survival, 

The tone of the work is set in the preface which remarks: “By providing g 
working manual of lecture notes, essential formulae, numerical examples and 
statistical tables, we hope that the student can mimimize his note-taking and 
give greater attention to class discussion. The Outline reflects two tenacious 
principles of my teaching: that a course well stocked with tested alterna. 
tives is the most effective compromise between the breadth of the field and 
the time limits of the graduate program, and that the only way to grasp 
basic concepts is to calculate numerical examples with actual data”, Ex. 
cellent precepts and they are admirably executed! The book is intended 
for an elementary course for biologists extending to 40 lectures accompanied 
by four hours of laboratory to each lecture. Although described by the auth- 
ors as a lecture manual its close gearing to practice makes it rather a labora- 
tory recipe manual. Yet it does not read as stiltedly as that description might 
suggest owing to the logical development from one topic to another and the 
skill with which the recipes are collated. Bliss states that the unusual for- 
mat was adopted because as a student he found a formal course outline to 
be an effective mode of study. Although one might anticipate such formula- 
tion to be deadly in a text book, the way this outline succinctly drives home 
each procedure looks like convincing evidence for the effectiveness of the 
method. ; 

The work begins at a very elementary level with two pages of pointers in 
arithmetic which anyone reaching a statistics course should know from 
school, but statistics lends new eyes to a biologist’s arithmetic and the re- 
fresher may be worth while. Anyone who needs to be told, however, the 
order of arithmetical operations will not two pages later know the meaning 
of a derivative stated without explanation. Such inconsistencies seem inevi- 
table in elementary statistics texts, and presumably a reader of the first 
chapter is intended to skip the odd paragraphs where he is invited to do 
partial differentiation; the practical instructions are still clear without it. 
The subsequent computing instructions are both useful and needed. 

The subject matter is, in the main, that usually covered in an elementary 
course: starting from the probability interpretation of the classic tea tasting 
experiment it proceeds through the binomial distribution, x?, normal dis- 
tribution and tests based on it, interval estimation, analysis of variance, 
simple regression and correlation, to Poisson and negative binomial distri- 
butions. The authors regret the lack of extension to covariance, partial 
regression, discriminant functions, probit analysis, sampling and sequential 
analysis. But covariance and sampling are the only ones that need be re- 
gretted. The others are probably better omitted from the kind of course 
envisaged by this book, and left to a more mature level. 

No attempt is made to teach theory, although the source whence methods 
derive is lightly indicated. At the same time, so differently from many ele- 
mentary texts, it is clear that the underlying theory is thoroughly under- 
stood by the authors themselves. Difficulties are not slurred over. An out- 


aevansgu2uisgsrs£ 


o @e Ff” wt 


OO TOF el ee 





p00K REVIEWS 977 


standing example is the statement in elementary language of the distinction 
between confidence and fiducial intervals, a statement which puts to shame 
many texts, superficia!ly more erudite, which pass this by in a cloud. With 
absence of theory goes any statement of maximum likelihood which appears 
only as an adjective for some statistics and in connection with a brief 
plunge into amounts of information associated with transformed metame- 
ters. This section must leave gasping those for whom Chapter 1 was written, 
but may stimulate some to delve deeper. 

Methods can seldom have been presented more concisely; there are only 
145 pages of text. With the help of a neat table of required F values, the 
Bross fiducial limits are stated in five lines. Orthogonal polynomials are 
presented in one and a half pages, and that includes a derivation for un- 
equally spaced values! Inevitably such compression leads to some inconsist- 
ency, as in the instruction to solve the general polynomial form by using 
powers of z as independent variables, although multiple regression and its 
normal equations have not been given. Presumably it is merely an indication 
toa worker to look up the reference cited. 

Description of models I and II for analysis of variance pleasingly main- 
tains Eisenhart’s basic distinction which lies in the kind of parameters to 
be evaluated: of location or of dispersion. It states simply that model I di- 
rects attention to the means of “particular treatments of interest to the ex- 
perimenter,” and avoids the later superimposed idea of “fixed effects” whose 
whole population is present, a concept which confuses the student who com- 
plains that he can think of myriad other treatments essentially belonging to 
the same population. Exposition of the correlation coefficient is happily re- 
strained to sampling from bivariate normal populations for which it is an 
appropriate statistic and thus avoids suggesting uses, all too common, for 
which it is unsuited. Two examples reasonably conform but example XV-6a 
is a regrettable exception. Here the distribution of y is distinctly skew, and 
the regression of x on y is curved (although that of y on z may be linear), 
so that these data are not appropriately described by the correlation co- 
efficient. 

Transformations get a larger than usual share of attention. Three and a 
half pages describe many by which curves may be converted to linear forms, 
including probits and logits; and a chapter of eleven pages is devoted to 
their uses for making numerous types of data amenable to analysis of 
variance. The utility of regression with transforms is developed with a 
chapter on bioassay when parallel linear regressions may be assumed. Nu- 
merous non-parametric and “quick and dirty” methods are described. Since 
arrangement is by types of data, these jostle with standard procedures in- 
stead of being relegated to another part of the book as if constituting a dif- 
ferent subject. The logical grouping thus achieved may make the book use- 
ful for reference by workers who do not need it for information but who have 
not yet accustomed themselves to think of these newer methods in their 
appropriate places in their battery of tools. Some may consider tests given 





978 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 196; 


for gaps in a group of means as too dogmatic, but in the present confused 
State of that subject they may be as good a choice as is practicable without 
more discussion than would conform to the style of this book. A further | 
‘plunge is made in favor of certain “pool” rules which are not yet usually 
regarded as tried recipes. It might be advisable to indicate that bias is only 
locally minimized for some gain of power, the absolute minimum of course 
being at “never pool.” 

Convention dictates some criticism if only to prove that the reviewer has 
not read uncritically! In a book so brief and practical it seems redundant 
to describe association coefficients and corrections for bias in the standard 
deviation: so far as I know the former have never yet served any useful 
purpose and the latter are never used. However they occupy together only 
three-quarters of a page. The first page of Chapter XII rather confuses the 
relations of regression with structural and functional relations. The first 
sentence of paragraph Ala seems to describe the Berkson case, yet that is 
introduced at the end of the paragraph as something different. In Chapters 
X and XI formulas for sums of squares between groups in terms of means 
appear incorrect unless Page 86 has been remembered with exceptional care. 
S(5,—5)? is intended to imply summation over all N values with repetitions 
as entered in, for example, Table d of example X-3; whereas in the formula 
to which it is set equal ST7,,?/f implies as usual summation only over the 
number of groups. In example XII-12 under M.S. for 284.45 read 248.45 (if 
not noticed the source of some of the F ratios may seem puzzling). 

At page 196 delete (N +1) from the denominator of formula X V-D3b2’ to 
test deviation of 2, 9 from yz, uy in a bivariate normal sample. For students 
who wish to look further and see whence formulas derive some indication 
should be given that this is the F transform of Hotelling’s 7T?. Formula 1’ 
for the corresponding test if uz were known (again with the erroneous N +1) 
seems to have been derived from 2’ by setting #=y-. Surely this is incor- 
rect? Knowing uz does not mean that a bivariate sample has # equal to it. 
Maximum likelihood indicates that under the given conditions the best esti- 
mate of wy is 9—by2(Z-u4z), which presumably would be tested with the 
usual ¢. It may have been intended to suggest that a sample was drawn such 
that 2 was deliberately made equal to the known yz, but this would return 
one to the ordinary regression case and a ¢ test. 

An interesting collection of 18 tables and 5 figures contains some which 
are not usually given in text books, such as 5 per cent points of Fax =$max’ 
/8min? in a set of k variances, range tests for cross classifications, and a chart 
for standard error of the mean in truncated samples from normal distri- 
butions. The nice chart of x? originally published by Bliss in 1944 is included. 
A hundred and fourteen examples from real data must constitute one of 
the finest such sets of biological data ever assembled. The bibliography 
lists 41 well chosen texts and books of tables, and 240 miscellaneous cita- 
tions, the majority of which are the sources for examples culled from an 
impressively wide field. This list of citations introduces a feature for which 





300K REVIEWS 979 


[have often wished in bibliographies but never before seen put into prac- 
tice: it indicates the page on which each citation is referred to. The book 
is produced by very clear multilith printing on one side only of 8} X11 inch 
pages. This makes it bulky but leaves room for working notes. Cardboard 
covers of the wire-stitched review copy would seem unlikely to survive hard 
laboratory use; but the more popular form is punched for use in loose-leaf 
binders. We are informed that few copies are now left in stock. A revised 
and expanded edition is expected to be published in book form in about 
three years. 

If told another elementary text is to be written, my reaction is: Please, 
not another! But An Outline of Biometry is a well worth while effort and 
should find a useful place in teaching statistical methods. 


Statistics and Mathematics in Biology. Edited by O. Kempthorne, T. A. Ban- 
croft, J. W. Gowen and J. L. Lush. Ames, lowa: The Iowa State College Press, 
1954. Pp. ix, 632. $6.75. 


J. H. Bennett, Cambridge, England 


\ publication comprises forty-four papers delivered during a five-week 
conference on Biostatistics held at Iowa State College in 1952. The 
conference, which was organized by the editors of this volume, grew out 
of “the evident need for synthesizing the concepts and methods of biology 
with the concepts and methods of statistics and mathematics” and was the 
means for bringing together many biologists and statisticians, including 
some distinguished scientists, to discuss problems in quantitative biology. 

The text is arranged in five parts corresponding with the different topics 
considered during the five weeks of the conference: I. General Biometrical 
Principles and Procedures; II. Changes in Population Number; III. Esti- 
mation of Populations; [V. Determination of Biological Response; and V. 
Genetical Analysis of Populations. Some but not all of the biological and 
statistical chapters are paired. As is fitting, this work is opened by Snedecor 
with a brief outline of the historical background of biometry. He is followed, 
first by Wright, who presents a résumé of some of his early work on path 
analysis, and then by Tukey who reminds us that regression is generally a 
more useful concept than correlation. Then Hotelling tells us of some con- 
tributions, mostly his own, to multivariate analysis. Problems of classifica- 
cation on the basis of multiple measurements are taken up by Isaacson 
and others in the next three chapters and this first section concludes, after a 
paper on the fitting of growth curves, with four chapters dealing with gen- 
eral questions of experimental design. These include two short review arti- 
cles and a paper by Quenouille on the use of genetically homogeneous sub- 
groups in experimentation. 

Part II comprises three chapters describing experimental and matie- 
matical studies of competition between species. 





980 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


Part III, on the estimation of population number, is opened by Jessep 
with a useful survey of the methodology of sampling human populations. The 
other seven chapters of this section deal with the methodology and some 
practical difficulties arising in forest inventory and in estimating the size of 
fish, wildlife and insect populations. 

Part IV begins with four chapters on bioassay; Cornfield’s lucid account 
of the comparison of toxicities with quantal responses and Bliss’ thorough 
discussion of insecticidal assays being two that are noteworthy. Six chapters 
on sensory tests for food products and nutritional and behavioral responses 
of animals make up the remainder of this section. 

Part V contains eleven chapters on subjects related more or less to popu- 
lation genetics. After a brief review by Levine of the genetics and racial 
frequencies of some human blood groups, Cotterman gives an account of 
the estimation of gene frequencies in populations. In two chapters, Neel and 
Schull write about various aspects of the genetic studies that have been 
made on atomic bomb survivors. Pollard describes the physical approach 
to the study of the biological effects of radiation on the living cell and 
Gowen writes about his radiation experiments with viruses. Outstanding 
chapters are contributed by Griffing who carefully explains the statistical 
and genetical analysis involved in a problem of quantitative inheritance 
n tomatoes and by Crow who brings together various approaches to the 
genetical analysis of small populations and considers different ways of cal- 
culating an effective population number. The book ends with a chapter on 
the role of the nucleo-proteins in cell growth and division. 

The length and treatment of these chapters differ as much as their subject 
matter and authors. A wealth of material is scattered through the book 
even if, in some places, its distribution is rather thin. Many of the papers 
will provide admirable discussion material for graduate students interested 
in quantitative biology. 

The conference itself, although devoted, it would seem, to a field that was 
much too wide, must surely have given a direct boost to the use of quantita- 
tive methods in biology. But if the synthesis of the concepts and methods of 
biology with those of statistics and mathematics means anything more than 
bringing together all these papers, it has not been achieved in this book. 

Not all papers delivered at conferences have lasting value or deserve 
preservation. If this were more widely realized, volumes such as the one 
under review would be much smaller and no less valuable. 


Statistical Analysis in Chemistry and the Chemical Industry. Carl A. Bennett and 
Norman L. Franklin. New York: John Wiley and Sons, Inc., 1954. Pp. xvi, 724. 
$8.00. 


F. R. Himuswortsa, Imperial Chemical Industries, Ltd. 


T IS WELL, before reviewing a book, to consider for whose benefit it is 
written; the present volume is addressed to “those in the chemical indus- 
try whose interest in this subject has quickened in recent years.” It will be 





300K REVIEWS 981 


found interesting and stimulating by those chemists who have made a more 
than average study of statistics, and by statisticians working in industry, 
but not by chemists who are mildly interested, or who want to learn ele- 
mentary statistics; these will find the book very heavy going, and will be 
repelled, rather than helped, by the masses of algebra used in deriving sam- 
pling distributions, expectations of mean squares, and regression formulas. 
This is not a criticism of the book: it is intended to warn a class of reader 
who would not appreciate it, and who would, if he survived the discussion of 
probability in Chapter 3, give up in Chapter 4. 

Chemists who are mathematically inclined, and who have already read 
widely, will find this book very helpful. They will find in it derivations of the 
tests they have hitherto taken on trust; they will find useful accounts of 
order statistics and non-parametric tests which will enable them to escape 
from the dilemma of what to do if the data are not normally distributed; 
and many techniques which will occasionally be useful, but which are other- 
wise available only in journals. They should, however, have a clear idea of 
the practical applications of statistics, and of the action to be taken after the 
statistical analysis has been completed, before they attempt to master the 
book. There is a strong tendency in the many examples to carry out a de- 
tailed statistical analysis, ending up with conclusions such as “all main ef- 
fects are significant.” This may illustrate the particular technique being dis- 
cussed, but it would be much more helpful if these conclusions were trans- 
lated into practical guides for action, even in a book of this nature. 

A very sound piece of advice to anyone about to analyze results statisti- 
cally is to look at the data, summarizing them in sub-tables and graphs; 
this will often tell him, at least qualitatively, all he wants to know. It may 
tell him that two factors interact strongly, in which case an analysis of the 
whole body of data is often valueless. The example on p. 395 illustrates this 
point admirably. Simple inspection conveys all the necessary information, 
and leads to conclusions more complete and more precise than those given. 
It may be said that the object of the example was to illustrate a method of 
analysis, but any example should give the best treatment of the data, and 
should be used to illustrate a technique only if this technique is the appropri- 
ate one. This criticism can be made of many of the examples. It is only when 
writing a textbook that one realizes how few suitable examples there are for 
illustrating a simple technique. The fact is that in practice, matters are 
rarely simple enough for the straightforward application of a standard tech- 
nique, a warning which might well be given in texts in industrial statistics. 

Chapter 2 deals with the assembly, grouping and graphing of data, and 
with simple statistics of location, dispersion and, perhaps rather prema- 
turely, regression. It is stated on p. 20 that the sample standard deviation, 
mean deviation, and range are all biased estimates of ¢. The first will be 
found puzzling without some explanation, and the bias is so slight that the 
statement is better omitted. To say that mean deviation and range are 
“biased estimates of o” is playing with words, and distinctly unhelpful. 





982 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


Here, and throughout, too many significant figures are retained in the final 
answers. 

Chapter 3 on “Probability and Samples,” although not very clear to 
newcomer to the subject, gives the fundamental principles on which later 
arguments are based. The independence of variables is briefly discussed, byt 
the chemical example is badly chosen; nitrogen content and nitrogen/phos. 
phorus ratio are unlikely to be independent. The matter is further obscured 
on p. 53, where the original weight of a coal sample and its weight after 
drying are said to be independent! Independence of errors is intended. The 
use of “ave (z)” = “average value of xz” in place of the usual E(x) = “expected 
value of x” has little to commend it, and leads to some curious expressions, 
such as “the average value of a single observation,” “average mean square,” 
etc. “Mean” is sometimes used instead of “average,” and “expectation” in 
at least one place. 

Chapter 4 on “Mathematical Machinery” will be interesting to the en- 
thusiast; it deals with generating functions, semi-invariants and other 
matters to be found only in works on mathematical statistics, and gives full 
derivations of the important sampling distributions. The letter “t,” which 
used to be an estimated mean divided by its estimated standard error, is 
sadly overworked. We have ¢, t,, t’, t’’, t’’, and maybe more. 

Chapter 5 on “Statistical Inference” covers confidence limits and tests of 
significance along the usual, and some unusual, lines. It is unnecessarily 
befogged by the inelegant shorthand of probability theory; plain language 
would occupy little more space, and would make the situation clearer. The 
method given for finding the number of trials required, for example, is ob- 
scure, and should be completed by finding the acceptance values for the null 
hypothesis. A curious suggestion is that the z-test, assuming z normally 
distributed, is quicker than the F-test. The section on “Sequential Tests” is 
inadequate. 

Chapter 6 deals with Relationships between Variables, i.e. regression and 
correlation, along conventional lines, with the usual complicated algebra and 
arithmetic. Regression always attracts the beginner, since apparently he 
can extract all the information from any old data. In fact it bristles with 
traps for the unwary; multiple and curvilinear regression should be ap- 
proached only by experts, and with the utmost circumspection. Simple 
methods should always be used before an elaborate calculation is under- 
taken. A “significant” value of r may be quite unhelpful, such as that on 
p. 280, whose 95% limits are 0.05 and 0.92! The final sections on “Correla- 
tion of More than Three Variables” and “Discriminant Functions” are not 
likely to tempt the reader to use these techniques, which is perhaps as well; 
these are games for professionals (if anyone) to play. 

Chapters 7 and 8 on “Analysis of Variance” and “Design of Experiments” 
occupy 280 pages—a fair sized book in itself—and are, in the reviewer's 
opinion, the most useful and interesting of all. They could well be expanded, 
subdivided, and made into a separate volume. In addition to the conven- 





BOOK REVIEWS 983 


tional treatment, there is much detailed discussion on such matters as 
“variance components” and confidence limits for these, co-variance, signifi- 
cance levels when a number of mean squares are to be tested, and other 
thorny problems not usually mentioned in books on applications of statistics. 
Too much emphasis is placed on the testing of null hypotheses, though in 
some cases this is supplemented by the calculation of confidence limits, little 
or nothing being said about the power of the tests. 

It has always seemed to the reviewer that for an experimenter to choose 
his conditions carefully, carry out an expensive experiment and a lengthy 
analysis, and then to apply a test which starts from the hypothesis that the 
experiment has been a complete failure, is somewhat illogical, and reflects 
great lack of faith in his scientific knowledge. It is usually inconceivable 
that the null hypothesis, c?=0, could be true; so why test it? What do we do 
if it is not contradicted? This book goes a certain length in the right direction, 
but the F-test is solemnly applied to test a hypothesis which cannot possibly 
be true, and which is obviously untrue from a rough inspection of the data. 
It is a good rule never to carry out the formal analysis without first looking 
at the data and drawing any obvious conclusions. In many of the examples 
the conclusions finally reached are obvious, and in scme cases the obvious 
conclusions are not reached at all. When a large interaction occurs in a 
crossed classification there may be no point in analyzing the data as a whole; 
two entirely separate sets of conclusions may be required, each applicable 
to one level of a factor. An excellent example is that on p. 395, on the testing 
of coal for volatile matter in steel and silica crucibles. One coal gave fair 
agreement and the other hopeless disagreement, and the data as they stand 
are not worth analyzing. From the data on p. 368, all the conclusions can be 
arrived at very simply, without analysis of variance, which only obscures 
the issue. A check on the residual sum of squares, from the differences be- 
tween duplicates, reveals an error, apparently in the table of results. The 
importance of checking calculations is not sufficiently emphasized; simply 
repeating the calculation is not good enough. 

These chapters will repay (and will need) careful study. There is a real 
attempt to face the many difficulties which are usually ignored. They would 
be clearer if allowance for random samples from finite populations had not 
been made so regularly, leading to much very cumbersome algebra which, 
incidentally, contains several mistakes, and if interaction terms had not 
been included in the expectations of the mean squares, only to be under- 
lined, the underline meaning “omit for a Type I set-up, which this is,” as 
the reader will discover after he has been puzzled somewhat. 

Chapter 9 on “Analysis of Counted Data” and Chapter 10 on “Control 
Charts” follow conventional lines and call for no special comment. Chapter 
11 on “Some Tests for Randomness” may not be found particularly useful, 
and the example on p. 674, in which the significance test contradicts the 
common sense view, is not helpful, particularly when two other tests strongly 
support common sense. 





984 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1953 


A review tends to contain more words of criticism than of appreciation, 
sometimes rightly so. This book can, however, be strongly recommended to 
all serious students of applied statistics. Because it breaks so much ground 
that is new, so far as textbook exposition is concerned, it inevitably invites 
criticism on points of detail. It is a real attempt to introduce new and legs 
well-known techniques to the industrial statistician. Some of these may not 
stand up to the requirements of practical usage, but they deserve to be 
better known and to be tried in practice. Non-parametric tests are a case in 
point. The reader will also find that after digesting the book he will under. 
stand the standard techniques better than before, and will appreciate some 
of their shortcomings. He may be in some respects a sadder, but he will 
certainly be a wiser man. 

Printing and lay-out are good, though some tables seem to be unnecessarily 
separated from the accompanying text. The book has a fair number of errors 
and misprints. An annoying habit is to use “1,” “2,” etc. in the text rather 
than “one,” “two,” etc. 


Probleme der Statistischen Methodenlehre in den Sozialwissenschaften. Oskar 
Anderson. Wiirzburg: Physica-Verlag, 1954. Pp. 345. 16.00D.M. 


WERNER Z. Hirscu, Washington University 


His text offers a concise treatment in the German language of many of 

the concepts and methods of modern statistics and is designed to meet 
the needs of a second or intermediate course in statistics for social scientists, 
especially economists. The author expects those who use the book to have 
the mathematical preparation of the average German high school graduate 
and no fear of the algebraic formulations of ideas. 

The book is organized into nine chapters: (1) Discussion of statistical 
method, (2) Descriptive measures, (3) Index numbers, (4) Statistical errors, 
(5) Probability theory, (6) Decision making, (7) Time series analysis, (8) 
Correlation, and (9) Logic of statistical method. Since the chapters vary 
greatly in length, the orientation of the book is perhaps best made clear by 
stating that the beginning one-third of its material is descriptive, while the 
rest deals mainly with problems of inference. This reviewer has no quarrel 
with the order and organization of the material. Not only does it have the 
advantage of putting easier things first, but also it is quite logical. Yet, 
many statisticians may not agree with the balance and inclusiveness of the 
presentation. 

Following a brief discussion of the arithmetic mean and standard devia- 
tion, the author treats index numbers in a very effective manner and in 
great detail. This chapter proves especially instructive because, after 4 
rather conventional treatment of the main problems and formulas of index 
numbers, there follows a stimulating discussion of some of the theoretical 
and practical problems of cost of living and production indexes, 


mem ste ete St IhOUrtlUrllT CUR Ct 





300K REVIEWS 985 


The next section is proof of the author’s experience and his ability to 
use his mathematics admirably well. His idea is to prove rigorously a few 
key concepts and to derive from them a number of important statistical 
rules and formulas. From a detailed discussion of probability flows the bino- 
mial theorem and the Poisson distribution, from which in turn are derived 
the rules and formulas for making confidence interval estimates and for test- 
ing hypotheses. The logical flow of this section is excellent. Many statisticians 
will be disappointed, however, that, only in introducing the topic of in- 
ference brief mention is made of errors of types I and II. The presentation of 
significance tests is concerned exclusively with making decisions of a known 
probability of committing an error of type I, and errors of type II are al- 
together neglected. 

The chapter on time series analysis may perhaps be criticized more than 
any other chapter, from the point of view of balance. Following some 
general introductory remarks, the conventional least squares method of 
finding trends is presented. Since the author helped develop the variate- 
difference-method, he spends considerable time on this rather infrequently 
used technique. In a few pages and without examples he disposes of the 
problem of seasonal variation, without indicating what to do about irregular 
variations and how to isolate cyclical variations. The applications to which 
time series analyses can be put are left very much in the dark. An omission, 
though of less importance, is the lack of significance tests for trend and sea- 
sonal variation. For a book which addresses itself to social scientists in 
general and economists in particular, a more appropriate section on time 
series analysis could have been written. 

The final and important section of the book, dealing with correlation, is 
very well written. With a minimum of mathematics the relations between 
correlation and regression analysis are shown, and the various formulas 
are derived and applied. The inferential aspects of correlation are ably pre- 
sented. The only complaint that one might have is that of the sixty pages 
of the chapter on correlation, only ten are devoted to multiple correlation. 

This reviewer believes that the inclusion of a discussion of decision making 
when there are more than two choices—analysis of variance—could have 
rounded out the book. Also, the concepts cf estimation and decision making 
might have been briefly extended into the important areas of sampling 
methods, which are becoming so very important as tools of the social scien- 
tists, 

Many statisticians are wondering what new contributions to the field 
are being made in other languages, especially German. In this connection it 
is interesting to note that in summing up the state of knowledge of statistics 
in general, and of statistics for social scientists in particular, the author 
draws almost exclusively on the work done outside Germany. His detailed 
index of authors comprises about one-and-one-half times as many Anglo- 
Saxon as German references. 

In summary, this book is a valuable addition to statistics texts in the 





986 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195; 


German language. It is well written and the material in it is up to date, 
There can be no doubt that it will greatly contribute to making available to 
those who read only German much of modern statistics as it has been de. 
veloped and taught abroad. 


Probability Theory. Michel Love. New York: D. Van Nostrand Company 
Inc., 1955. Pp. xv, 515. $12.00. 


Watrter L. Suits, University of North Carolina 


T APPEARS to be a growing practice for authors of books on statistics to 
I provide in their prefaces a suggestion concerning the way in which readers 
should study their works. Recently, Leonard J. Savage recommended us 
to read about the foundations of statistics sitting bolt upright on a hard chair, 
at a desk, and now Loéve asks us to approach his monumental treatise on 
the foundations of probability theory “armed permanently with patience, 
pebble, and reed”. This advice is to be taken seriously, for the 515 pages of 
this masterly treatise contain an enormous wealth of material much of which 
is presented, of spatial necessity, in a very concise style, and, moreover, Van 
Nostrand have found it desirable to resort to a smaller type than usual, 
which makes this book somewhat more difficult to read (in the physical 
sense) than their earlier books in this series. 

Loéve’s book, then, is an authoritative account of probability theory as 
it exists at the present time, and the dust jacket claims accurately that, in 
the class of books dealing with probability theory, “there is nothing as 
comprehensive in treatment, ..., or as up-to-date”. But this is definitely 
a book for the specialists. Many highly successful practicing statisticians 
will dispute Loéve’s claim that Part One of his book contains the notions of 
Measure Theory that every statistician requires. (Has Fisher ever used the 
Radon-Nikodym theorem?) Few statisticians with less than powerful proba- 
bilistic tendencies will feel prepared to pay the high cost of this book and 
devote to it the close study it most certainly deserves. Nevertheless, those 
who are primarily concerned with probability theory have every reason to be 
grateful for the present work, which is destined to become a standard ref- 
erence in the subject, and which fills so adequately an unfortunate gap in 
the literature of probability (recently lamented by Doob in his “Stochastic 
Processes”). 

The first 50 pages are devoted to an introductory treatment of familiar 
“elementary” problems in probability theory, which do not require the use 
of measure theory. The purpose of this section is to introduce in a natural 
way the basic ideas involved in probability theory. It does not replace the 
sort of treatment one finds in, say, Feller’s “An Introduction to Prob- 
ability Theory”, but forms a useful and compact revision course and intro- 
duction to the notation to be employed (it includes a discussion of Markov 
Chains on the lines of Kolmogorov’s classical paper). A similar remark 
applies to Part One, which fills the next 94 pages and is devoted to Measure 
Theory. In spite of the existence of a few excellent and readily available 





p00K REVIEWS 987 


treatises on Measure Theory, it is perhaps desirable to have a collection of 
the most important measure-theoretic results included in a treatise on prob- 
ability theory. Loéve provides us with more than this, he supplies a “self- 
contained” course on measure theory. If the reader has already obtained a 
grounding in measure theory (such as may be obtained by reading Munroe’s 
“Introduction to Measure and Integration”) then Part One becomes, like 
the introductory section, an excellent revision course. But the level of con- 
ciseness that Loéve has aimed at has entailed a somewhat sophisticated atti- 
tude which a reader meeting measure theory for the first time might easily 
find trying. 

The thorough-going measure theoretic treatment of probability theory 
begins on page 149 and occupies the remaining 366 pages. It is divided into: 
Part Two, on the general concepts and tools of probability (74 pages); Part 
Three, on sums of independent random variables and their limit properties 
(114 pages) ; and Part Four, devoted to a careful explanation of conditioning, 
to limit properties of sums of dependent random variables, and in the last 
section to random functions of the second order. An enumeration of chapter 
headings would be tedious, but certain chapters deserve especial mention. 
Chapter V on sums of independent random variables is excellent. Chapter VI, 
on the problem of convergence of laws of sequences of sums of random vari- 
ables deals not only with the classical theorems of the Central Limit type, but 
with the problems of more recent origin concerning the determination of the 
family of all possible limit laws and of the conditions for convergence to 
these laws. This chapter gives a welcome exposition of a subject which, 
until recently, one could only study in papers scattered through many jour- 
nals (and mostly in languages other than English). For similar reasons, it is 
good to find that Loéve has included an up-to-date account of Ergodic 
Theory as Chapter IX. 

There are many brief historical sketches of the development of various 
problems and of their solutions, a feature which will appeal especially to 
newcomers to the subject. There is also a very adequate supply of examples 
(“complements and details”) at the end of each chapter. Unfortunately, 
several misprints occur in this first edition, but the reviewer learns that 
an errata sheet is now being provided, and these_errors will doubtless be 
eliminated in any later editions. 

Every serious probabilist should, and doubtless will, possess a copy of 
this important work. Loéve is to be complimented on completing his Hercu- 
lean task at a uniformly high level of elegance (a task which, one gathers 
from his preface, he regards as an essay in the poetic form!). 


A Million Random Digits with 100,000 Normal Deviates. RAND Corporation. 
Glencoe, Illinois: The Free Press, 1955. Pp. xxv, 400, 200. $10.00. 
HIs is by far the largest and best collection of random digits yet (or 
likely to be) published in book form. It is the collection from which 
the 22,475 random digits published in this JourNaL from 1952,to 1954 
were taken. 





988 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


The million random digits are arranged 2500 to a page on pages numbered 
1 to 400. There are fifty numbered rows on a page, each row divided into 
blocks of 5 and the blocks set off in pairs. In addition there are 100,000 unit 
normal deviates of 4 significant figures (3 decimals) and a plus or minus sign, 
These have been derived from 500,000 of the random digits in a manner 
and sequence which is fully described. They are arranged 500 to a page, 10 in 
each of 50 numbered lines, on pages numbered 1 to 200. 

The 25 page Introduction, which regrettably is unsigned, is a worthwhile 
piece of statistical literature itself. The method of producing the random 
numbers is described, and a series of interesting tests of their randomness jg 
presented. The instructions ‘or using the tables are excellent; they include 
simple and clear directions for generating from the random digits random 
numbers from any specified population. 

The book is excellently printed and bound, and the price seems reasonable 
for the size. It is a book which most statisticians will want to own, and if the 
history of previous tables of random numbers is any criterion, it may not be 
in print long. 

W.A.W. 


A History of the Faculty of Political Science, Columbia University. R. Gordon 
Hoxie, Sally Falk Moore, Joseph Dorfman, Richard Hofstadter, Theodore W. 
Anderson, Jr., John D. Millett, Seymour Martin Lipset. New York: Columbia 
University Press, 1955. Pp. x, 326. $4.50. 


Wiuu1aM R. Passt, Jr., Washington, D. C. 


HIs history of the Faculty of Political Science is written in two paris. 

Part I, by R. Gordon Hoxie emphasizes the administrative aspects of the 
establishment of the School of Political Science in 1880 as an adjunct and 
extension to the School of Law, and its growth and development into the 
Faculty of the present day. Part II contains a history of each of the six 
existing departments, written by members of each department. 

The development of statistics at Columbia reflected in this history il- 
lustrates the changing character of its subject matter as much as its institu- 
tional position. The history of statistics as a unified subject matter discipline 
is tersely presented in the chapter on the Department of Mathematical 
Statistics written by Theodore W. Anderson, Jr. In this, Anderson shows 
that the Department was established in 1946 as a result of the efforts of 
Harold Hotelling, who ironically was called to the Institute of Statistics 
at the University of North Carolina just before the new department was 
finally recognized. Hotelling initially came to the Department of Econo- 
mics in 1931 to carry on the mathematical work of H. L. Moore, but in 
subsequent years turned from mathematical economics to concentrate on 
mathematical statistics. In 1938, he brought Abraham Wald to Columbia as 
an assistant. Wald became the first head of the newly established depart- 
ment in 1946. 





p00K REVIEWS 989 


The history of applied statistics at Columbia is contained in the accounts 
of four of the five other departments. In the account of the Department of 
History, there is reference to the work of Richmond Mayo-Smith, whose 
Science of Statistics (two volumes) was at one time widely recognized. Under 
the Department of Sociology, one will find the names of Franklin Giddings, 
Robert E. Chaddock, who was appointed Assistant Professor of Statistics in 
1911, and was in 1925, president of the American Statistical Association, 
William F. Ogburn, also a president of the American Statistical Association, 
in 1931 and past editor of this Journal, and Frank A, Ross, also a past editor 
of this Journal. Under the Department of Economics, the history contains 
the names of Henry L. Moore, Wesley Mitchell, Frederick C. Mills, the 
latter two of whom were presidents of the American Statistical Association 
in 1918 and 1934, respectively. In the Department of Anthropology Franz 
Boas’ course in statistics, termed “wholly theoretical”, remained the “foun- 
dation of his teaching for forty years”. 

The chapter on the history of the Department of Economics written by 
Joseph Dorfman is especially noteworthy for the tenderness and affection 
with which he assesses the contribution of his predecessors and his colleagues, 
even to noting the “humanizing, intangible values” contributed by Mrs. 
Gertrude D. Stewart, secretary to the department for almost half of its 
seventy-five years. 


Quality Control Through Statistical Methods. Norbert Lloyd Enrick. Institute 
of Textile Technology: Modern Textiles Magazine Handbook No. 3, 1954. 


Rosert J. Haver, North Carolina State College 


uae Enrick, Research Statistician, Institute of Textile Technology, 
has prepared a generally excellent manual on quality control for the 
textile industry. In the past few years there has been a very rapid increase in 
interest in statistical quality control in this industry. The American Society 
for Quality Control now has a newly formed and very active Textile Section. 
Numerous articles on the subject have appeared in the literature; in fact, 
the material of this handbook originally appeared as a series of articles in 
Modern Textiles Magazine. 

Most modern mills have quality control in some form, though as yet dis- 
appointingly few use the statistical tools found so useful by other industries 
for this purpose. Enrick’s manual is intended to present these techniques— 
control charts, “analysis of variations,” and sampling acceptance inspec- 
tion—in simple straightforward manner using textile illustrations and textile 
language. Enough theory is given, though in non-mathematical form, so that 
the reader will have an appreciation of the logical foundations of the methods. 

The first four chapters are concerned with frequency distributions, aver- 
ages, and measures of variation. The arithmetic average, median, mode, 
harmonic mean, standard deviation, coefficient of variation, range, mean 
deviation, and average per cent variation are all included. Somehow the 





990 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


“upper half mean length” has been overlooked. On the whole the discussion 
of the relative merits of these measures is good. 

Chapters 5 and 6 deal with “Analysis of Variations” and “Process to 
Process Variations Analysis”. The author shows how sources of variation 
in the process may be analyzed by techniques analogous to analysis of 
variance but using ranges instead of variances. This material is particularly 
well presented. If suffers somewhat from lack of discussion of sample sizes 
necessary to carry out such analyses. 

Chapters 7 and 8 are devoted to control charts for averages, per cent de. 
fective, and defects. Again, insufficient attention is given to the sampling 
problems encountered in a quality control program in textile manufacturing, 
Not the least of these problems is that of providing effective and economical 
routine control on large batteries of machines of the same type when quality 
aberrations are likely on individual machines. 

Chapters 9 and 10 are on “Sampling Economically and Effectively” and 
“Ready Made Sampling Plans”. In the latter chapter sequential sampling 
gets heavy emphasis. A set of single sampling plans is given but passed over 
rather hurriedly. 

There is a final short chapter on “Management Aspects of Quality Con- 
trol.” 

All in all it is the reviewer’s opinion that Enrick’s manual is to be highly 
recommended to its intended audience. 


City of Birmingham Abstract of Statistics Number 3, 1952-1954. Edited by 
Richard Padley, Ph.D., Statistical Officer to the Corporation, and A. B. Neale, 
B. Com. Published for the Corporation by the City of Birmingham Central 
Statistical Office, by order of the General Purposes Committee, 1954. Pp. 139. 
Paper. Ten Shillings and Sixpence. 


De Ver SHotes, Chicago Association of Commerce & Industry 


= compilation of statistical data for a city is interesting both for its 
content and its method of compilation. Few if any, American cities have 
Central Statistical offices which publish local data in as comprehensive 4 
form as has been done by Birmingham, England. 

The Birmingham report covers the following major statistical categories: 
Population; Vital Statistics; Health; Housing; New Building; the City Es- 
tates (public housing); Central Redevelopment Areas; Education; Police and 
Fire Services; Care of Children; Miscellaneous Social Services; Transport, 
Communication and Water Supply; Employment, Industry and Trade; 
Government; Meteorology. The data are presented in tabular form through- 
out, with little text material. 

The data reveal, to some extent, the nature of the British way of life in 
which governmental information is available on many phases of urban 
activity for which American cities have no systematic reporting. This is 
especially true of the health statistics, which are in greater detail than is 





p00K REVIEWS 991 


generally found in American communities. On the other hand, the industrial 
statistical coverage, other than employment data, is almost completely 
lacking. The only data on industry other than employment are for size of 
manufacturing plants by number of employees and for factory buildings 
under construction. 

The population statistics are in no more detail than is generally available 
for the United States cities. The smallest geographic units are the 38 wards, 
for which population is estimated annually 1946-1953. Birmingham evi- 
dentally has problems of continuity of small area boundaries similar to 
United States cities, as the Ward boundaries were completely revised jin 
1949, so that all ward statistics break at that point. 

It is also interesting to note that in line with many American cities, Bir- 
mingham grew by 12 per cent in the period between 1940-1953, according to 
the annual estimates of population. Detroit, for example, grew 14 per cent 
over the same period. Birmingham had a total population of 1,118,000 in 
1950; Detroit had a 1950 population of 1,850,000. 

Some tables, prepared by the Central Statistical Office, are presented on 
the extent of annual net migration. This is generaily the unknown factor 
in estimating population of most large United States cities. The Birmingham 
Central Statistical Office calculates annual net migration as the difference 
between the population estimated from life tables and the population esti- 
mates of the Registrar General. No statement as to the reliability of those 
estimates is given. 

Under Chapter 2, “Vital Statistics”, there are three sections: 1. Births; 
2. Marriages; 3. Deaths. Most of the information under this chapter is 
available for American cities, although much of it is unpublished. 

Similarly Chapter 3, the largest chapter in the volume, containing 36 
tables covering 30 pages, has brought together many statistics on different 
phases of health, infant mortality, infectious diseases, civic health services, 
much of which is available upon request in large United States cities but 
which few United States cities publish. Chicago, for example, has available 
probably much more data on infant mortality than any other city in this 
country, and has these data broken down in 935 census tracts within the 
city limits. The Birmingham data are for wards only, of which there are 38. 
However, there is very little detail on infant mortality actually. published 
in Chicago, much less than has been presented in the Birmingham volume. 

Other subjects covered in this volume are handled in much the same 
manner as population, vital statistics, and health statistics just discussed, 
although with fewer tables. Few American cities publish annual reports in 
which the statistical data pertaining to the city’s population and industrial 
growth are presented. Even where such reports are made, statistical data 
are almost never presented in the detail found in the Birmingham volume. 

One feature of the Birmingham publication which can be somewhat im- 
proved is the narrative description of the contents of the various chapters. 
While some pitfalls in the use of tables are pointed out, there is very little 





992 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1933 


information on the method of compilation of the statistics, especially where 
the data are estimated. It is very difficult, therefore, to formulate any 
opinion as to the validity of the estimating procedures or the reliability of 
the estimates themselves. A somewhat expanded explanation of each table 
would be extremely helpful. 

The outstanding feature of the Birmingham Abstract of Statistics is that 
it is published by the city’s own Central Statistical Office, which seems to 
be an almost unheard of adjunct to American city governments. Although 
American cities have the same data available, and some departments of 
city governments may publish similar data in their own annual reports, 
none of the United States’ cities, as far as this reviewer is aware, maintain 
a Central Bureau charged with collection and publication of pertinent sta- 
tistical data. It would be helpful for city governments in the United States 
to secure copies of the Birmingham publication as an example of what a 
city’s Central Statistical Bureau can produce in the way of factual informa. 
tion helpful to business, welfare agencies, medical interests, city planners, 
and others who are interested in local area data. 


The Redistribution of Income in Postwar Britain. Allen Murray Cartter. New 
Haven: Yale University Press, 1955. Pp. viii, 242. $5.00. 


Setma F, Goupsmita, U. S. Department of Commerce 


— interesting book presents estimates for 1948—49 of the distribution by 


income brackets of the several types of central government tax liabilities 
in Great Britain and of the various benefits received by consumers in the 
form of central government expenditures. For taxable income brackets up to 
£500 benefits received are found to exceed tax liabilities by a substantial 
amount, and for higher income brackets the reverse is the case. The break- 
even point is somewhere between £550 and £650 of taxable income depend- 
ing on which of three assumptions is used as to the distribution by income 
brackets of certain types of “indivisible government expenditures,” (civil 
expenditures for general governmental operations, expenditures on the armed 
forces, interest on the national debt, and the government surplus.) About 
80 per cent of the population is estimated to have been on the gaining end. 
The greater part of the volume consists of an analysis of the figures and 
a description of the methodology and assumptions used in deriving them. 
Thus the first 6 of the 17 chapters in the book summarize the problem and 
the results, and the last 7 chapters present an admirably clear and detailed 
technical appendix on methodology. The intermediate chapters include two 
comparisons of the 1948-49 results, the first with the well-known estimates 
of the distribution of taxes and government benefits in Great Britain in 
1937 developed by Tibor Barna, and the second with United States estimates 
for a prewar and postwar year. 
The basic income distribution used by Cartter throughout the study is 
the frequency distribution of tax returns classified by taxable income brack- 


LL ee lel CU ml 





poOK REVIEWS 993 


ets, supplemented at the bottom of the income scale by an estimate of the 
number of “tax-family heads” below the tax-exemption limit. To taxable 
income in each bracket Cartter adds nontaxable types of personal income as 
well as “nonpersonal income” which consists mainly of undistributed cor- 
porate profits. The latter item is added to match the corresponding inclusion 
of corporate profits taxes in his tax liability distribution. These additions 
serve to increase the amount of total income in each taxable income bracket 
so that average incomes are sometimes above the upper limit of the bracket. 
Like other workers in this field, Cartter makes no attempt to shift the tax 
returns to income brackets comparable in definition to the income aggregates 
within the brackets, although he makes some adjustment for this factor in 
discussing his results. 

Cartter’s use of income tax returns as the unit of classification in his 
income distribution raises an interesting point of difference between the 
statistical work on U. S. and U. K. incomes. Most of the recent analysis of 
tax incidence in this country has been in terms of an income size distribution 
of spending units or family units (e.g., the studies by Musgrave, Tucker, 
Adler, and the U. 8. Departments of Commerce and Labor), whereas workers 
on British tax incidence apparently feel that tax returns are reasonably 
close to a family or spending unit classification and use them as such (e.g., 
Cartter, Barna, Seers, and the Central Statistical Office). Perhaps this is the 
case, but the available figures are somewhat puzzling. Thus there were about 
14} million private households in Great Britain in 1951 according to the 
Census of that year as compared with the 23 million tax return units used 
by Cartter (including the 3 million family heads he adds at the bottom of the 
income scale.) If the 23 million figure approximates the number of spending 
units in Great Britain, then the 2 to 3 ratio of private households to spend- 
ing units that obtains is far different than that found in the United States. 
In 1950, for example, there were 43} million private households in this coun- 
try, and, according to the Survey of Consumer Finances, some 52 million 
spending units, a ratio of about 5 to 6. Do the two countries really differ so 
much in this respect or is this another instance where comparisons are dis- 
torted by differences in definitions? At any rate this reviewer, who has strug- 
gled with the problem of converting U. S. tax returns into family units (to 
allow for the large number of “supplementary family earners” filing their own 
returns), would welcome a discussion as to why comparable adjustments are 
net required in the U. K. statistics. 

Cartter’s allocations of the various taxes among income brackets raise a 
number of questions that recall the lively discussion on this subject that en- 
sued after Richard Musgrave published his estimates in the National Taz 
Journal in 1951. Taxes on corporate profits, for example, are allocated among 
income brackets by Cartter (as are undistributed corporate profits) on the 
basis of estimated holdings of shares. Musgrave, in his standard case, as- 
sumed that part of the comparable U. S. tax was shifted to consumers, part 
was shifted back to wage earners, and only the balance—somewhat over one- 





994 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


half—was allocated on the basis of shareholdings. Cartter’s methodology re. 
sults in corporate profits taxes which are progressive throughout the income 
scale (except for the lowest bracket), whereas Musgrave found them pro. 
gressive in the United States only in the income range above $7,500. While 
this reviewer is inclined to favor Cartter’s procedure, the main point is that 
differences in the findings for the two countries must be examined with care. 

The most interesting chapter in the book is that comparing the 1948-49 
results with Tibor Barna’s results for 1937. The conclusion is that the dis. 
tribution of income before taxes and benefits was slightly less unequal in 
postwar than in prewar Britain, and that the redistribution of income, 
measured by comparing income minus taxes plus benefits in the two years, 
was more effective in reducing inequality in 1948-49 than in 1937, although 
not markedly so. 

The other chapter on comparisons—with United States figures—is much 
less satisfactory. For this country Cartter uses estimates of the distribution 
among income brackets of spending units, income, taxes, and government 
benefits developed by John Adler, one of the contributors to Fiscal Policies 
and the American Economy. But Adler’s U. S. figures present the surprising 
conclusion that income inequality in the United States (before taxes and 
benefits) was practically the same in 1946-47 as in 1938-39. Kuznets and 
others working in this field have found, of course, that this was not the 
case; inequality as measured by the relative shares of total personal income 
received by the top 5 per cent of the population has since been shown by 
Kuznets to have declined markedly since 1939. Although Cartter notes 
Kuznets’ findings in a footnote (Adler’s figures were published in 1951 and 
Kuznets’ in 1953), a better decision in view of the differences in the findings 
would have been to discard the entire comparison of the two countries, or 
better still to postpone such a comparison to a later book. Cartter finds that 
the decline in relative income inequality between the late 1930’s and the 
late 1940’s was greater in Great Britain than in the United States when he 
uses Adler’s estimates, but that the reverse is the case when he uses Kuznets’ 
series. Obviously the conclusion rests on the choice of figures, a point on 
which Cartter takes no stand. 

A second study that would compare income distribution changes in the 
two countries would be a most interesting contribution that appears to be 
feasible in view of recent studies of tax incidence in this country. Attempts 
te estimate the distribution of benefits received among income brackets are 
in large part conjectural and this reviewer, like Harold M. Groves, believes 
them to be a fairly unprofitable occupation. But a comparative study of be- 
fore-tax income distribution and of tax incidence in the United States and the 
United Kingdom would be of real significance and one which Cartter is 
well qualified to produce. 





p00K REVIEWS 993 


An Essay on the Economic Theory of Rank. R. H. Tuck. Oxford: Basil Black- 
well, 1954. Pp. 52. Seven shillings six pence. 


Rosert M. Sotow, Massachusetts Institute of Technology 


EQUENCY distributions of economic magnitudes—personal income and 
wealth, corporate earnings, the size of firms, the length of stretches of 
unemployment—nearly all tend to be highly skewed. Apart from the prob- 
lems of statistical inference thus created, this fact offers a theoretical chal- 
lenge. Given that human abilities are roughly Gaussian, as are most of the 
other physical and biological conditioning factors, what process generates 
out of this symmetrical raw material the gross asymmetry of the observed 
economic facts? This is an old problem. Pareto’s! celebrated hyperbolic 
distribution curve was originally offered as a purely empirical fit, but various 
rationales for it have since been found, most recently by Champernowne.? 
Perhaps the commonest explanation at the present leads to the logarithmic- 
normal distribution as prototype. The reasoning is essentially what Gibrat? 
called the law of porportional effect; incomes are generated by a sort of dif- 
fusion process except that independent random shocks have effects propor- 
tional to the displacement already achieved. The result is naturally a cen- 
tral limit theorem for log income. J. C. Kapteyn‘ and S. D.Wicksell® worked 
along these lines years ago, and it is stiil being developed. H. L. Moore*® 
wrote on this general problem, and Milton Friedman’ has recently put for- 
ward a somewhat different sort of hypothesis. In contrast to the diffusion 
type of argument there have been some recent efforts* to use discrete ran- 
dom processes for a model. 

Tuck’s essay is in this general line of descent. He concentrates primarily 
on the distribution of firms by size (number of employees) and his reasoning 
about incomes is rather more casual. The basic theoretical structure goes 
something like this: It is the nature of human organizations to be hierarchi- 
cal. There is a limit to the number of subordinates one man can supervise. 
Hence as a firm grows in size it piles up layers of executives: one at the top 
supervising, say, three vice-presidents, each of whom in turn supervises 
three vice-vice-presidents, and so forth down to the basic operatives. Oc- 
cupational rank can be measured by the number of layers between oneself 
and the dirty work. Now individuals enter this hierarchy and diffuse through 
it according to their initial training, their native ability, and their good 





1 Pareto, Vilfredo, Cours d’Economie Politique, Lausanne: F. Rouge, 1897, Vol. 2, Pp. 299-345. 

2 Champernowne, D.G., ‘A model of income distribution,” Economic Journal, LXIIT (1953) 318-51. 

3 Gibrat, R., Les Inégalités Economiques, Paris: Librairie du Recueil Sirey, 1931. 

4 Kapteyn, Jacobus C., Skew Frequency-curve in Biology and Statistics, Groningen: Astronomical 
Laboratory, 1903. 

5 Wicksell, Sven Dag, The Genetic Theory of Frequency. 

6 Moore, Henry Ludwell, Laws of Wages, New York: Macmillan, (1911), 71-103. 

7 Friedman, Milton, “Choice, chance, and the personal distribution of income,” Journal of Political 
Economy, LXI (August 1953), 277-90. 

8 Bernardellis, Harro, “The stability of income distributions,” Sankhya, 6 (Part 4, 1943), 351-62. 

Solow, Robert, The Dynamics of the Income Distribution, unpublished thesis, 1951, Harvard Uni- 
versity Library. 





996 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


fortune. But because technologies differ among industries, the relation be. 
tween rank and the attached income will also differ. In a state of equilibrium 
it is to be expected that the distribution of income will be about the same 
in all industries, else entrants would be attracted from the less-favored to the 
more-favored industries. But this implies that the distribution of occupa- 
tional ranks will differ from industry to industry. Now the assumption is 
made that in any industry there will be more individuals at any given rank 
than can be efficiently supervised by the available people of the next highest 
rank. Tuck concludes (or assumes) that the excess individuals at any rank 
will split off and found independent firms of a size just so large that a person 
of that particular rank can function as chief executive. This provides em- 
ployment for a chain of people of lower rank, and again the superfluity drains 
off into independent firms. Continuing this process to the end, a distribution 
of firms by size is determined. On the further assumption that in each 
industry log income is proportional to rank, an income distribution follows. 

One gets nowhere in this business except by starting with highly simplified 
assumptions, and I don’t mean to criticize Tuck for doing what is necessary 
and doing it in an original way. But the particular assumptions he has chosen 
do strike me as none too sturdy a foundation to support the weight of some 
of the strongest empirical regularities in economics. It seems somewhat far- 
fetched to believe that firms come into being, grow, and contract in response 
to the availability of people at certain occupational ranks. Nor does the 
picture of the capital market implicit in the theory seem to capture reality 
in a way that would satisfy most economists. But no star is wholly lost, and 
some of Tuck’s ideas provide a valuable insight into the nature of inter- 
industrial occupational equilibrium. 

The argument is supported by fits of the theoretical distribution to the 
observed pattern of firm sizes in the factory trades in the U. K. as of 1935, 
and to the iron and steel and non-ferrous metals trades in particular. The 
fit is impressively good, but one’s awe is somewhat mitigated by the fact 
that the theoretical distribution contains six free parameters in addition to 
the sample size. 


Demand for Meat. Elmer J. Working. Chicago: The Institute of Meat Packing, 
1954. Pp. xi, 136. $1.00. Paper. 


J. A. Norprn, Jowa State College 


ype monograph is directed at prediction of the prices of beef, pork, and all 
meat as a composite commodity. Single equation models are used. The 
demand equations are fitted on the basis of the period 1922-1941, and post- 
war predictions are made. 

Working has given us a careful and resourceful analysis of the forces af- 
fecting the sales of meat. The area of study is one in which there is consider- 
able room for disagreement on broad methodological grounds; some such 
disagreement is indicated below. But though it is not to be expected that the 





B00K REVIEWS 997 


author’s decisions on methodology will command unanimous assent, it is 
clear that they have enabled him to make another stimulating contribution 
to price analysis. 

The author’s principal conclusion pertains to the difference between long 
run and short run elasticity of the demand for meat. Price is made a function 
of both current consumption and past consumption, in effect. The short run 
price elasticity of demand is derived from the partial effect of current con- 
sumption. The long run price elasticity is derived on the assumption that 
current consumption equals average consumption per capita over the past 
ten years. The author estimates the former elasticity at 0.75, and the latter 
at 1.25. He suggests that restricting meat production might increase farmers’ 
incomes in the short run but reduce them in the long run. 

In this connection the author’s conceptions of demand and demand elastic- 
ity are interesting. He argues that, in the case of meat, consumption is the 
cause of price, the quantity offered for consumption being largely determined 
by events before the period in which the price is set (p. 8).4 He implies that 
in other problems the demand relation might be the more usual one repre- 
senting quantity consumed as the effect of price. 

The price elasticity of demand is written in the usual way, even though in 
the case of meats the numerator represents percentage change in a causing 
variable. 

If the demand equation for the ith good involves other individual goods, 
they are represented by quantities rather than prices. Thus in the case of 
pork the demand relation indicates how pork price responds to pork quantity 
given the quantities of other meats. 

Of course holding other prices constant is an alternative to holding other 
quantities constant. And recently Friedman and Bailey have discussed 
holding constant the consumer’s indifference index.? 

At this point the general orientation of the monograph becomes sig- 
nificant. The study is a study in price analysis, and relates primarily to price 
prediction. But it can be argued that prediction of itself is not highly im- 
portant. What is important is the action-choice based on the prediction or 
analysis. Thus if the purpose behind the analysis is that of chosing between 
two price-fixing systems, and if we are concerned with the result of varying 
the price of pork while keeping other meat prices constant, then the usual 
definition of demand (in terms of response of quantity to price) is reasonable. 
If we are primarily concerned with consequences of output restriction, then 
the author’s definition of demand is appropriate. If we plan to compensate a 
social group for changes damaging its members the Friedman-Bailey ap- 





1 It is not evident why he emphasizes this view, since later (p. 39) he dissociates “dependent” and 
“independent” from causality, and in estimating the coefficient connecting a pair of variables uses a geo- 
metric mean of the coefficients derived by minimizing the sum of squares in several directions (p. 40). 

? Friedman, Milton, “The Marshallian demand curve,” Journal of Political Economy, 57 (1949), 
463-95; Bailey, Martin J., “The Ma.shallian demand curve,” Journal of Political Eeonomy, 62 (1954), 
255-60. 





998 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


proach is reasonable. Specifying the use to which the work is to be put seems 
indispensable. 

The author reaches conclusions about both price control and output re. 
strictions (p. 4). But the study does not appear to have been organized with 
the specific objective of facilitating choice among actions pertaining to 
either price control or output restriction. It is not clear that each methodo- 
logical decision has been made on the basis of its effect on specified action 
problems. Of course it is hard to guess how much such a procedure would 
have changed the results; certainly the author’s findings, such as those 
dealing with the differences between long and short run demand elasticities, 
appear to be very useful. 

A second important conclusion is that “when the price level is rising, if 
meat prices only keep pace with the average price level of all consumer’s 
goods, consumers will demand an increased amount of meat. Consequently, 
during a period of inflation and rising prices, if there is no increase in meat 
supplies available to consumers, meat prices, being flexible and dependent 
upon supply and demand in the short run as well as in the long run, will 
rise more rapidly than the general level of commodity prices” (p. xi). Thus 
in one regression analysis (p. 42) when meat consumption and deflated dis- 
posable income are held constant a one per cent rise in the price level is asso- 
ciated with a 1.17 per cent rise in the price of meat. This conclusion and 
evidence might well be examined further since it is not clear a priori why 
inflation should bring about such results. 

A third important conclusion is that the notoriously difficult prediction of 
postwar prices on the basis of interwar functions is greatly facilitated by 
including an independent variable to represent the average of the past ten 
years’ deflated per capita incomes. Thus long continued changes in real 
income are said to be more significant for meat prices than are equal real 
income changes of shorter duration. This conclusion appears to be well 
founded a priori, and introducing a variable for deflated per capita dis- 
posable income for the previous ten years appears to bring about a very 
significant improvement in the ability of an interwar model to predict post- 
war prices. 

Although not directly involved in the author’s main conclusions several 
other issues are interesting. “Demand Index A” and “Demand Index B” 
have been used as dynamic indexes of the demand for meat. The A index is 
the result of dividing an index of per capital disposable income by an index 
of uhe slow-moving components of the consumers’ Price Index (p. 52). In 
this context “dynamic” refers to a situation in which a variable’s effect 
depends upon its rate of change or upon the length of time which has elapsed 
since a change occurred. Index A is said to be a combined index of disposable 
income and the rate of change of prices. At this point (p. 52) further dis- 
cussion would have been helpful; it does not seem that index A can do what 
the author has designed it to do. How does an index of slow-moving prices 
provide an indication of the rate of change of all prices? 

Index B differs from index A only by dealing with expenditure in place of 


ao oc 


—_ oo -_— 


- rh age .& oe ae 





300K REVIEWS 999 


income. Substitution of index B for index A improves the predictive power 
of the interwar equations; the postwar peculiarity seems to be high total 
expenditure relative to income, rather than high meat expenditure relative 
to total expenditure. Yet it may not be desirable to use index B, since using 
it requires information we are not likely to have when we want to make pre- 
dictions. Perhaps it is partly for this reason that the author regards the use 
of index B as a step on the way to the use of a lagged income variable. 

Least squares methods are used throughout, largely on the basis of the 
contention that the superiority of the simultaneous equations method can- 
not be shown if the simultaneous equations assumptions (e.g., the assump- 
tion that there are no errors in the predetermined variables) are not met 
(p. 26). The only use of simultaneous equations methods appears in an 
appendix written by Vincent West. West sets up equations for the demand 
for meat, supply of meat, and income. The equations are linear in the original 
variables. The demand equation is over-identified, and full maximum likeli- 
hood procedure is used in getting estimates of the structural coefficients. 
The estimates of the coefficients in the demand equation differ only slightly 
from the estimates of the coefficients when the same demand equation is 
subjected to least squares procedure. Apparently (p. 26) Working decided 
against using simultaneous methods partly on the basis of West’s work. 

West’s model differs from Working’s markedly: the variables used and the 
form of the equations are different for the two studies, for instance. It would 
be interesting to see the results of applying simultaneous methods to models 
more closely related to Working’s model. If Working is right about the sig- 
nificance of dynamic elements, for instance, then the fact that the two basic 
methods give similar results for non-dynamic models may not be significant. 
(Of course the suggestion that there is further work to be done does not 
detract from the significance of what the author has done in the present 
monograph.) 

Multicollinearity is discussed at several points in the monograph. There 
is a suggestion that in interwar analysis some difficulties may have been 
averted by the accident that real income did not vary much (p. 10). If it 
had varied more, its potential correlation with other independent variables 
might have caused the kind of trouble that has shown up in attempts to 
predict postwar demand. The use of time as an independent variable is con- 
sidered dangerous; if time is extraneous but is correlated with some of the 
other independent variables, then some of their influence will be attributed 
to time, so that later prediction will be poor (p. 45). Multicollinearity in- 
volving prices, populations, and real income interwar is said to have resulted 
in difficulty in ascribing a reasonable influence to each variable; postwar 
predictions are said to suffer, since the post-war correlations among these 
independent variables are ower than the interwar correlations (p. 59). 

Working does not introduce time as a variable in his equations. Instead, 
he experiments with the use of first differences as a preliminary step in some 
of his investigations, while recognizing the fact that first differences tend to 
obscure relevant long-term relations (p. 45). His arguments against includ- 





1000 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1955 


ing time as an independent variable appear cogent, although considerable 
doubt remains about the selection of a substitute procedure. 

On all the significant issues, the author’s discussion should accelerate the 
improvement of our procedures; in this connection his treatment of lagged 
variables is especially noteworthy. His suggestion on the use of the data 
should provide significant aid to other researchers in the field. And his em. 
pirical results represent an important addition to the stock of information 
useful in decision-making relating to meats. 


Labor Productivity in Soviet and American Industry. Walter Galenson. New 
York: Columbia University Press, 1955. Pp. xiv, 273. $5.50. 


Srmon Rotrrensera, University of Chicago 


[: His introduction to this volume, Galenson enumerates its purposes: 
“to trace the development of labor productivity in a number of Soviet 
industries since 1928; to compare productivity in these industries with that 
in their U. S. counterparts; and to arrive at some general conclusions on 
comparative labor productivity in Soviet and American industry.” The 
research was done as a project of the RAND corporation and draws heavily 
upon Soviet materials. 

Statistical comparison of labor productivity, either inter-temporally or 
inter-spatially, is a hazardous business, because change and non-uniformity 
persist in occurring. Those who venture to make comparisons must sometimes 
introduce heroic compensatory adjustments. 

In attempting this comparison of labor productivity in the United States 
and the USSR, Mr. Galenson encountered not only all of the conventional 
difficulties which attach to inter-spatial comparisons within a country; 
he found also the usual international comparative problem of expressing 
values in units of common currency and, to make matters worse, a whole 
range of difficulties, unique to the Soviet Union and a handful of other coun- 
tries, which are associated with the meaninglessness of valuation in “mar- 
kets” in which prices are administratively determined. 

To escape the valuation and conversion problems, he resorted to the ex- 
pression of output in terms of physical product, and this drove him to a 
position from which he was able to compute productivity differences be- 
tween the countries for only a few industries with “fairly homogeneous 
products.” 

He covers eight manufacturing and extractive industries—coal mining, 
iron ore mining, crude oil and natural gas, iron and steel, machinery, cotton 
textile manufacturing, shoe manufacturing, and beet sugar processing. 
Product diversity in the machinery case compelled him to resort to a system 
of makeshift pricing of Soviet output and to establish value of output per 
worker comparisons. In all other cases, he used the ratio of physical output 
to labor input. Comparisons are established for the immediate pre-World 
War II years and something is also said about productivity trends in the 
Soviet Union in the decade or so preceding this and in the postwar period. 





pooK REVIEWS 1001 


Even physical output-labor input ratios yielded results which contain 
errors of some magnitude because the products of “homogeneous product 
industries” are not at all homogeneous when they are carefully examined. 
Differences are sometimes intrinsic. Men’s shoes are not like children’s 
shoes and shoes made of leather are not like those made of canvas and rubber 
and required labor inputs vary among them. Since the product mix of “an 
industry” varies among countries, international productivity comparisons 
come to be made for different, and not the same, industries. 

The construction of meaningful physical comparisons requires the dis- 
covery of tolerable equivalents which express different products in some 
common unit. Sometimes, equivalence is found by assumption (as when an 
author says, “I shall ignore this difference, because it does not seem to be of 
sufficient magnitude to matter”) and, sometimes, it is created by corrective 
adjustment. Even when differences are not intrinsic, in the sense just dis- 
cussed, they occur because censal criteria are different among countries. 

Mr. Galenson has had to contend with both kinds of non-homogeneity. 
He has been explicitly conscious of the problem and has manipulated his 
data with care. The results show labor productivity to have been lower in 
the USSR than in the United States, in the late thirties, in all the compared 
industries. Soviet labor productivity ranged from a low of 15 per cent of 
American in heavy construction machinery manufacturing to a high of 
58 per cent in tractor manufacture. The median value in a set of sixteen was 
46 per cent. 

The most valuable parts of Mr. Galenson’s study are the carefully- 
prepared materials on physical outputs and inputs in the selected industries. 
Their preparation, which required the combing of Russian-language publi- 
cations, makes them available, in appropriately adjusted form, to people 
who cannot read Russian. The least valuable parts are his attempts at ex- 
planation of observed differences. Taken all together, the book is an im- 
portant addition to the literatures of the Soviet economy and of international 
productivity comparison. 


Diagrams in Punched Card Computing. Fred Gruenberger. Madison, Wisconsin: 
- pete pad of Wisconsin Press, 1954. Pp. 18 Text, 108 wiring diagrams. $3.75 
oose-Leaf. 


R. Zroua, Dominion Bureau of Statistics 


ry publication is in two parts, a very brief text followed by a collection 
of 108 wiring diagrams. The text deals with problems in the preparation 
and use of punched cards, operating principles of the IBM 602A and 604, and 
explanations of some of the diagrams shown in the second part of the publi- 
cation. Wiring diagrams are mostly for specialized operations on the IBM 
077, 402, 405, 416, 417, 513, 521, 602A, 604, and CPC Model I, with some 
reference to Remington Rand equipment. These include diagrams for testing 
machines; calculating a correlation matrix, square root, variance," factor 
analysis, etc.; merging, selecting, and matching of cards; and other jobs of a 
computing nature, 





1002 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 10955 


The 18-page text touches somewhat superficially on a variety of punched 
card and punched card equipment topics. It includes a few sentences op 
each of such subjects as: detecting flaws in new cards, locating punching 
errors, testing equipment, guarding against intermittent machine trouble 
and operator error, determining what not to compute on punched card 
equipment, and other related topics. 

The collection of diagrams gives some indication of what scientific com. 
putations are possible on punched card equipment. Whether or not the 
machines suggested are the most efficient way of carrying out such complex 
computations, however, is another and more difficult question. 

While the author mentions instruction, along with statistical and scien- 
tific computation and research in punched card techniques, as a function of 
a punched card installation, the fact that explanations of wiring diagrams 
are very brief or omitted entirely limits the use of this publication for training 
purposes. In addition the publication deals chiefly with the solution of 
problems of a scientific nature which require a high degree of accuracy and 
involve relatively few punched cards for any one problem; some of the pro- 
cedures suggested do not apply to processing of punched cards generally. 

The publication is in loose-leaf form and it appears that supplements, 
chiefly new diagrams, will be published from time to time. 


Introducci6n a los métodos de la estadfstica. (Segunda parte.) Sizto Rios. 
Madrid: 1954. Pp. viii, 193-434.* Paper. 


Pau R. Hatmos, University of Chicago 


ONTINUING in the style and at the level established in the first part of 

this book, the author discusses an impressively long and varied list of 
topics. Among them are: efficient and sufficient statistics, the method of 
maximum likelihood, confidence intervals, tests of hypotheses, decision 
functions, sequential analysis, non-parametric estimation, quality control, 
the analysis of variance, design of experiments, and stochastic processes. 
There is a long appendix (41 pages) on operations research, in which topics 
such as the theory of queues and the Montecarlo method are mentioned. 
On the whole the work gives the impression of being more eclectic than 
selective. There are (regrettably) not so many exercises in this part of the 
book as in the earlier part. The mathematical level is about the same; such 
things as Kolmogoroff’s treatment of probability via set functions receive a 
brief mention only, and that near the end of the book. In sum: a good bird’s 
eye view, but likely to be frustrating as a source for learning the details of 
the subject. 





* In the review of the first part of the book, in this Journal, 1953, p. 154, the number of pages was 
incorrectly printed. Instead of pp. 205, it should have been pp. xiii, 192. After the customary front mat- 
ter, the pagination of the second part continues that of the first. 





pooK REVIEWS 1003 


National Income—1954 Edition—A Supplement to the Survey of Current Busi- 
ness. United States Department of Commerce, Office of Business Economics. 
Washington: U. S. Government Printing Office. 1954. Pp. v, 249. $1.50. Paper. 


ts foreword describes this volume as follows: “Since publication in 1934 
[ot the first of a series of national income reports by the Department of 
Commerce, steady progress has been achieved in extending the scope of the 
estimates, in improving their quality, and in making them available prompt- 
ly, as well as in sharpening the concepts. A principal contribution of the 
present report—which is closely similar in form to the 1951 Nationa. 
Income supplement so as to facilitate use by those familiar with that volume 
—is the presentation of estimates incorporating data collected in the post- 
war industrial and population censuses. 

“In the preparation of these new estimates, opportunity was also taken to 
rework many of the income and product series for the entire period back to 
1929 in order to reflect additional data sources and improvements in esti- 
mating techniques. A special feature is the presentation of constant-dollar 
gross national product in 1947 prices instead of 1939 prices, as previously 
used. 

“The tables presented in this volume incorporate the results of the first 
comprehensive review of sources and methods since the initial publication 
of the national income statistics in the form of an economic accounting sys- 
tem in the 1947 Nationa Income supplement. While the changes that have 
been introduced do not alter the over-all picture of the United States econ- 
omy afforded by the income and product accounts, they improve the data 
in many detailed aspects. 

“The text material in the 1951 volume also has undergone review. This 
resulted principally in reworking the descriptions of statistical sources and 
methods to accord with the new estimates and bringing up to date the sum- 
mary of economic developments. 

“The statistical changes have been examined for the light they shed on the 
reliability of the estimating techniques. This analysis confirms the adequacy 
of these techniques to produce reliable preliminary measures of national 
output and its major components on the basis of incomplete information. 
Revisions for some of the more detailed components, however, were sub- 
stantial and underscore the need which we have repeatedly stressed for 
further development of the primary data sources on which the naiional 
income estimates are based. 

“The present report contains all the national income statistics of the 
Office of Business Economics except the annual series on income by States 
and the distributions of family income by size classes. With these exceptions 
it supersedes all previously published figures, and the series contained in 
this volume will be kept up to date in the monthly Survey or CURRENT 


BusINEss.” 
W.A.W. 





THE STATE OF FOOD AND AGRICULTURE 1955 


Review of a Decade and Outlook 
Rome, September 1955 
Approximately 200 pages, numerous graphs and tables. 
US $2.50 or 12s. 6d. 
Unlike its predecessors, this year’s report on the state of food and agriculture does not deal maj 


with the current situation and short-term outlook, but instead reviews the progress and experiences 
of the whole decade. The treatment of this theme is analytical rather than descriptive and an attem 


made to bring out the underlying causes of 


some appraisal of the results achieved. 
in a final chapter, 


e main twar developments and to om 


he main issues and problems which lie ahead are discussed 


Part II (in the same volume) reviews the production, consumption and trade developments com. 


ity 
outlooks. 


ty (including forestry and fisheries) in the last ten years and gives short-term 





YEARBOOK OF FOOD AND AGRICULTURAL STATISTICS 1954 


Part I—Production 


Provides basic statistics on world agricultural 
peapeeien, It gives authoritative information on 

nd use, agricultural population, crops, livestock 
numbers and products, food eugeties and their 
utilizati and on commercial fertilizers, pesti- 
cides, and agricultural machinery. It also in- 
cludes the more important series of agricultural 
commodity prices in many countries, as well as 
index numbers of prices received and paid by 
farmers, and of agricultural production. 

(In preparation) 


Part Il—Trade 


A basic reference work on world trade in agri- 
cultural products giving statistics of the im 
and exports of the major agricultural commodi- 
ties. It includes regional and world totals, com. 
puted from official and unofficial information, 

or some major commodities, data are given by 
trade season as well as by calen year. 


Price: Each part $3.50 or 17s. 64. 





YEARBOOK OF FISHERIES STATISTICS 1952/53 


Part |—Production and Craft 


The fourth Fisheries Yearbook amplifies the part 
on Fp my ap and gives more detailed data on 
craft than in earlier issues. It has therefore been 
found necessary to publish it in two volumes. 
Part I also contains a historical section on catch 
from 1910 to 1953 broken down by species and 
countries. 


Part Il—Trade 


The statistical tables give the same data as in 
poorious years, but have been re-arranged fol- 
owing the Standard International Trade Clas. 


sification. 


Part II will be issued in the later part of 1955. 
Price: Each part $2.00 or 10s. 





YEARBOOK OF FOREST PRODUCTS STATISTICS 1954 


This yearbook, the eighth in the series, contains official information from more than 100 countries and 


territories on production and trade for roundwood, processed wood, 
ds, as well as a summary of world trade. 


paperboard, and fi 


wood pulp, newsprint, paper and 
Price: $2.50 or 12s. 64. 





MONTHLY BULLETIN OF AGRICULTURAL ECONOMICS AND STATISTIG 


This monthly bulletin is the authoritative international publication of up-to-date statistics on all 


major aspects of the world’s food and agriculture situation, especially 
Data for the most important commodities, ¢.g., wheat, an 
year, and data for other commodities once or twice a year according to their importance. Each 


uction, trade and prices. 
own at least three times a 
issue 


rice, are 


contains articles, commodity notes and statistical tables. 





CATALOGUE OF PUBLICATIONS 
1945-1954 


FAO Documents Service and Sales Agents 
in all parts of the world are stocked with 


this new catalogue and bulletins describin 
new titles. These will be sent on receipt o 


a postcard indicating your particular in- 
terests. 











Please tion the J 





Editions in English, French and Spanish. 
Annual Subscription: $5.00 or 25s. 
Single issues: $0.50 or 2s. 6d. 


All FAO publications are available 
in U.S.A. from FAO's sales agent: 


COLUMBIA UNIVERSITY PRESS 
International Documents Service 
2960 Broadway, NEW YORK 27, N.Y. 


l of the Amenican Statistica, Association in writing advertisers 





